GlobAl

EdiTioN

Software Engineering

TENTH EdiTioN

ian Sommerville

Software engineering

tenth edition

Ian Sommerville

Boston Columbus Indianapolis New York San Francisco Hoboken

Amsterdam Cape Town Dubai London Madrid Milan Munich Paris

Montreal Toronto Delhi Mexico City São Paulo Sydney Hong Kong Seoul

Singapore Taipei Tokyo

Editorial Director: Marcia Horton

Marketing Coordinator: Kathryn Ferranti

Editor in Chief: Michael Hirsch

Senior Manufacturing Buyer: Carol Melville

Acquisitions Editor: Matt Goldstein

Senior Manufacturing Controller, Production,

Editorial Assistant: Chelsea Bell

Global Edition: Trudy Kimber

Assistant Acquisitions Editor, Global

Text Designer: Susan Raymond

Edition: Murchana Borthakur

Cover Art Designer: Lumina Datamatics

Associate Project Editor, Global

Cover Image: © Andrey Bayda/Shutterstock

Edition: Binita Roy

Interior Chapter Opener: © graficart.net/Alamy

Managing Editor: Jeff Holcomb

Full-Service Project Management: Rashmi

Senior Production Project

Tickyani, Aptara®, Inc.

Manager: Marilyn Lloyd

Composition and Illustrations: Aptara®, Inc.

Director of Marketing: Margaret Waples

Pearson Education Limited

Edinburgh Gate

Harlow

Essex CM20 2JE

England

and Associated Companies throughout the world

Visit us on the World Wide Web at:

www.pearsonglobaleditions.com

© Pearson Education Limited 2016

The rights of Ian Sommerville to be identified as the author of this work

have been asserted by him in accordance with the Copyright, Designs and

Patents Act 1988.

Authorized adaptation from the United States edition, entitled Software

Engineering, 10th edition, ISBN

978-0-13-394303-0, by Ian Sommerville, published by Pearson Education ©

2016.

All rights reserved. No part of this publication may be reproduced, stored

in a retrieval system, or transmitted in any form or by any means,

electronic, mechanical, photocopying, recording or otherwise, without

either the prior written permission of the publisher or a license permitting

restricted copying in the United Kingdom issued by the Copyright

Licensing Agency Ltd, Saffron House, 6–10 Kirby Street, London EC1N

8TS.

All trademarks used herein are the property of their respective owners.

The use of any trademark in this text does not vest in the author or

publisher any trademark ownership rights in such trademarks, nor does

the use of such trademarks imply any affiliation with or endorsement of

this book by such owners.

ISBN 10: 1-292-09613-6

ISBN 13: 978-1-292-09613-1

British Library Cataloguing-in-Publication Data

A catalogue record for this book is available from the British Library

10 9 8 7 6 5 4 3 2 1

Typeset in 9 New Aster LT Std by Aptara®, Inc.

Printed and bound by Courier Westford in the United States of America.

Preface

Progress in software engineering over the last 50 years has been

astonishing. Our

societies could not function without large professional software systems.

National

utilities and infrastructure—energy, communications and transport—all

rely on

complex and mostly reliable computer systems. Software has allowed us to

explore

space and to create the World Wide Web—the most significant

information system

in the history of mankind. Smartphones and tablets are ubiquitous and an

entire ‘apps industry’ developing software for these devices has emerged

in the past few years.

Humanity is now facing a demanding set of challenges—climate change

and

extreme weather, declining natural resources, an increasing world

population to be fed and housed, international terrorism, and the need to

help elderly people lead satisfying and fulfilled lives. We need new

technologies to help us address these challenges and, for sure, software

will have a central role in these technologies. Software engineering is,

therefore, critically important for our future on this planet. We have to

continue to educate software engineers and develop the discipline so that

we meet the demand for more software and create the increasingly

complex future systems that we need.

Of course, there are still problems with software projects. Systems are still

some-

times delivered late and cost more than expected. We are creating

increasingly com-

plex software systems of systems and we should not be surprised that we

encounter

difficulties along the way. However, we should not let these problems

conceal the

real successes in software engineering and the impressive software

engineering

methods and technologies that have been developed.

This book, in different editions, has now been around for over 30 years

and this edition is based around the essential principles that were

established in the first edition: 1. I write about software engineering as it

is practiced in industry, without taking an evangelical position on

particular approaches such as agile development or

formal methods. In reality, industry mixes techniques such as agile and

plan-

based development and this is reflected in the book.

4 Preface

2. I write about what I know and understand. I have had many suggestions

for

additional topics that might be covered in more detail such as open source

development, the use of the UML and mobile software engineering. But I

don’t

really know enough about these areas. My own work has been in system

depend-

ability and in systems engineering and this is reflected in my selection of

advanced topics for the book.

I believe that the key issues for modern software engineering are

managing com-

plexity, integrating agility with other methods and ensuring that our

systems are

secure and resilient. These issues have been the driver for the changes and

additions in this new edition of my book.

Changes from the 9th edition

In summary, the major updates and additions in this book from the 9th

edition are:

• I have extensively updated the chapter on agile software engineering,

with new

material on Scrum. I have updated other chapters as required to reflect the

increas-

ing use of agile methods of software engineering.

• I have added new chapters on resilience engineering, systems

engineering, and

systems of systems.

• I have completely reorganized three chapters covering reliability, safety,

and security.

• I have added new material on RESTful services to the chapter covering

service-

oriented software engineering.

• I have revised and updated the chapter on configuration management

with new

material on distributed version control systems.

• I have moved chapters on aspect-oriented software engineering and

process improvement from the print version of the book to the web site.

• New supplementary material has been added to the web site, including a

set of

supporting videos. I have explained key topics on video and recommended

related

YouTube videos.

The 4-part structure of the book, introduced in earlier editions, has been

retained

but I have made significant changes in each part of the book.

1. In Part 1, Introduction to software engineering, I have completely

rewritten

Chapter 3 (agile methods) and updated this to reflect the increasing use of

Scrum.

A new case study on a digital learning environment has been added to

Chapter 1

and is used in a number of chapters. Legacy systems are covered in more

detail

in Chapter 9. Minor changes and updates have been made to all other

chapters.

Preface 5

2. Part 2, which covers dependable systems, has been revised and

restructured.

Rather than an activity-oriented approach where information on safety,

security

and reliability is spread over several chapters, I have reorganized this so

that

each topic has a chapter in its own right. This makes it easier to cover a

single

topic, such as security, as part of a more general course. I have added a

com-

pletely new chapter on resilience engineering which covers cybersecurity,

organizational resilience, and resilient systems design.

3. In Part 3, I have added new chapters on systems engineering and

systems of

systems and have extensively revised the material on service-oriented

systems

engineering to reflect the increasing use of RESTful services. The chapter

on

aspect-oriented software engineering has been deleted from the print

version but

remains available as a web chapter.

4. In Part 4, I have updated the material on configuration management to

reflect

the increasing use of distributed version control tools such as Git. The

chapter

on process improvement has been deleted from the print version but

remains

available as a web chapter.

An important change in the supplementary material for the book is the

addition of

video recommendations in all chapters. I have made over 40 videos on a

range of topics that are available on my YouTube channel and linked from

the book’s web pages. In cases where I have not made videos, I have

recommended YouTube videos that may be useful.

I explain the rationale behind the changes that I’ve made in this short

video:

http://software-engineering-book/videos/10th-edition-changes

Readership

The book is primarily aimed at university and college students taking

introductory

and advanced courses in software and systems engineering. I assume that

readers

understand the basics of programming and fundamental data structures.

Software engineers in industry may find the book useful as general

reading and to

update their knowledge on topics such as software reuse, architectural

design,

dependability and security and systems engineering.

Using the book in software engineering courses

I have designed the book so that it can be used in three different types of

software engineering course:

1.

General introductory courses in software engineering. The first part of the

book has been designed to support a 1-semester course in introductory

software engineering. There are 9 chapters that cover fundamental topics

in software engineering.

6 Preface

If your course has a practical component, management chapters in Part 4

may be

substituted for some of these.

2. Introductory or intermediate courses on specific software engineering topics.

You can create a range of more advanced courses using the chapters in

parts

2–4. For example, I have taught a course in critical systems using the

chapters in

Part 2 plus chapters on systems engineering and quality management. In a

course

covering software-intensive systems engineering, I used chapters on

systems

engineering, requirements engineering, systems of systems, distributed

software

engineering, embedded software, project management and project

planning.

3. More advanced courses in specific software engineering topics. In this case,

the chapters in the book form a foundation for the course. These are then

supplemented with further reading that explores the topic in more detail.

For example,

a course on software reuse could be based around Chapters 15–18.

Instructors may access additional teaching support material from Pearson’s

website.

Some of this is password-protected and instructors using the book for

teaching can

obtain a password by registering at the Pearson website. The material

available includes:

• Model answers to selected end of chapter exercises.

• Quiz questions and answers for each chapter.

You can access this material at:

www.pearsonglobaleditions.com/Sommerville

Book website

This book has been designed as a hybrid print/web text in which core

information in the printed edition is linked to supplementary material on

the web. Several chapters include specially written ‘web sections’ that add

to the information in that chapter. There are also six ‘web chapters’ on

topics that I have not covered in the print version of the book.

You can download a wide range of supporting material from the book’s

website

(software-engineering-book.com) including:

• A set of videos where I cover a range of software engineering topics. I

also rec-

ommend other YouTube videos that can support learning.

• An instructor’s guide that gives advice on how to use the book in

teaching different courses.

• Further information on the book’s case studies (insulin pump, mental

health care

system, wilderness weather system, digital learning system), as well other

case

studies, such as the failure of the Ariane 5 launcher.

Preface 7

• Six web chapters covering process improvement, formal methods,

interaction design, application architectures, documentation and aspect-

oriented development.

• Web sections that add to the content presented in each chapter. These

web sec-

tions are linked from breakout boxes in each chapter.

• PowerPoint presentations for all of the chapters in the book and

additional PowerPoint presentations covering a range of systems

engineering topics are

available at pearsonglobaleditions.com/Sommerville.

In response to requests from users of the book, I have published a

complete

requirements specification for one of the system case studies on the book’s

web site.

It is difficult for students to get access to such documents and so

understand their structure and complexity. To avoid confidentiality issues,

I have re-engineered the

requirements document from a real system so there are no restrictions on

its use.

Contact details

Website: software-engineering-book.com

Email: name: software.engineering.book; domain: gmail.com

Blog: iansommerville.com/systems-software-and-technology

YouTube: youtube.com/user/SoftwareEngBook

Facebook: facebook.com/sommerville.software.engineering

Twitter: @SoftwareEngBook or @iansommerville (for more general

tweets)

Follow me on Twitter or Facebook to get updates on new material and

comments on

software and systems engineering.

Acknowledgements

A large number of people have contributed over the years to the evolution

of this

book and I’d like to thank everyone (reviewers, students and book users)

who have

commented on previous editions and made constructive suggestions for

change. I’d

particularly like to thank my family, Anne, Ali, and Jane, for their love,

help and

support while I was working on this book (and all of the previous

editions).

Ian Sommerville,

September 2014

Contents at a glance

Preface 3

Part 1 Introduction to Software Engineering

15

Chapter 1

Introduction

17

Chapter 2

Software processes

43

Chapter 3

Agile software development

72

Chapter 4

Requirements engineering

101

Chapter 5

System modeling

138

Chapter 6

Architectural design

167

Chapter 7

Design and implementation

196

Chapter 8

Software testing

226

Chapter 9

Software evolution

255

Part 2 System Dependability and Security

283

Chapter 10 Dependable systems

285

Chapter 11 Reliability engineering

306

Chapter 12 Safety engineering

339

Chapter 13 Security engineering

373

Chapter 14 Resilience engineering

408

Part 3 Advanced Software Engineering

435

Chapter 15 Software reuse

437

Chapter 16 Component-based software engineering

464

Chapter 17 Distributed software engineering

490

Chapter 18 Service-oriented software engineering

520

Chapter 19 Systems engineering

551

Chapter 20 Systems of systems

580

Chapter 21 Real-time software engineering

610

Part 4 Software

management 639

Chapter 22 Project management

641

Chapter 23 Project planning

667

Chapter 24 Quality management

700

Chapter 25 Configuration management

730

Glossary

757

Subject index

777

Author index

803

Pearson wishes to thank and acknowledge the following people for their

work on the Global Edition: Contributor

Sherif G. Aly, The American University in Cairo

muthuraj m., Android developer

Reviewers

mohit P. Tahiliani, National Institute of Technology Karnataka, Surathkal

Chitra Dhawale, P. R. Patil Group of Educational Institutes, Amravati

Sanjeevni Shantaiya, Disha Institute of management & Technology

contentS

Preface 3

Part 1 Introduction to Software Engineering

15

Chapter 1 Introduction

17

1.1 Professional software development

19

1.2 Software engineering ethics

28

1.3 Case studies

31

Chapter 2 Software processes

43

2.1 Software process models

45

2.2 Process activities

54

2.3 Coping with change

61

2.4 Process improvement

65

Chapter 3 Agile software development

72

3.1 Agile methods

75

3.2 Agile development techniques

77

3.3 Agile project management

84

3.4 Scaling agile methods

88

10 Contents

Chapter 4 Requirements engineering

101

4.1 Functional and non-functional requirements

105

4.2 Requirements engineering processes

111

4.3 Requirements elicitation

112

4.4 Requirements specification

120

4.5 Requirements validation

129

4.6 Requirements change

130

Chapter 5 System modeling

138

5.1 Context models

141

5.2 Interaction models

144

5.3 Structural models

149

5.4 Behavioral models

154

5.5 model-driven architecture

159

Chapter 6 Architectural design

167

6.1 Architectural design decisions

171

6.2 Architectural views

173

6.3 Architectural patterns

175

6.4 Application architectures

184

Chapter 7 Design and implementation

196

7.1 Object-oriented design using the UmL

198

7.2 Design patterns

209

7.3 Implementation issues

212

7.4 Open-source development

219

Chapter 8 Software testing

226

8.1 Development testing

231

8.2 Test-driven development

242

Contents 11

8.3 Release testing

245

8.4 User testing

249

Chapter 9 Software evolution

255

9.1 Evolution processes

258

9.2 Legacy systems

261

9.3 Software maintenance

270

Part 2 System Dependability and Security

283

Chapter 10 Dependable systems

285

10.1 Dependability properties

288

10.2 Sociotechnical systems

291

10.3 Redundancy and diversity

295

10.4 Dependable processes

297

10.5 Formal methods and dependability

299

Chapter 11 Reliability engineering

306

11.1 Availability and reliability

309

11.2 Reliability requirements

312

11.3 Fault-tolerant architectures

318

11.4 Programming for reliability

325

11.5 Reliability measurement

331

Chapter 12 Safety engineering

339

12.1 Safety-critical systems

341

12.2 Safety requirements

344

12.3 Safety engineering processes

352

12.4 Safety cases

361

12 Contents

Chapter 13 Security engineering

373

13.1 Security and dependability

376

13.2 Security and organizations

380

13.3 Security requirements

382

13.4 Secure systems design

388

13.5 Security testing and assurance

402

Chapter 14 Resilience engineering

408

14.1 Cybersecurity

412

14.2 Sociotechnical resilience

416

14.3 Resilient systems design

424

Part 3 Advanced Software Engineering

435

Chapter 15 Software reuse

437

15.1 The reuse landscape

440

15.2 Application frameworks

443

15.3 Software product lines

446

15.4 Application system reuse

453

Chapter 16 Component-based software engineering

464

16.1 Components and component models

467

16.2 CBSE processes

473

16.3 Component composition

480

Chapter 17 Distributed software engineering

490

17.1 Distributed systems

492

17.2 Client–server computing

499

Contents 13

17.3 Architectural patterns for distributed systems

501

17.4 Software as a service

512

Chapter 18 Service-oriented software engineering

520

18.1 Service-oriented architecture

524

18.2 RESTful services

529

18.3 Service engineering

533

18.4 Service composition

541

Chapter 19 Systems engineering

551

19.1 Sociotechnical systems

556

19.2 Conceptual design

563

19.3 System procurement

566

19.4 System development

570

19.5 System operation and evolution

574

Chapter 20 Systems of systems

580

20.1 System complexity

584

20.2 Systems of systems classification

587

20.3 Reductionism and complex systems

590

20.4 Systems of systems engineering

593

20.5 Systems of systems architecture

599

Chapter 21 Real-time software engineering

610

21.1 Embedded system design

613

21.2 Architectural patterns for real-time software

620

21.3 Timing analysis

626

21.4 Real-time operating systems

631

14 Contents

Part 4 Software Management

639

Chapter 22 Project management

641

22.1 Risk management

644

22.2 managing people

652

22.3 Teamwork

656

Chapter 23 Project planning

667

23.1 Software pricing

670

23.2 Plan-driven development

672

23.3 Project scheduling

675

23.4 Agile planning

680

23.5 Estimation techniques

682

23.6 COCOmO cost modeling

686

Chapter 24 Quality management

700

24.1 Software quality

703

24.2 Software standards

706

24.3 Reviews and inspections

710

24.4 Quality management and agile development

714

24.5 Software measurement

716

Chapter 25 Configuration management

730

25.1 Version management

735

25.2 System building

740

25.3 Change management

745

25.4 Release management

750

Glossary

757

Subject index

777

Author index

803

PART 1 Introduction

to Software

Engineering

My aim in this part of the book is to provide a general introduction to soft-

ware engineering. The chapters in this part have been designed to support

a one-semester first course in software engineering. I introduce impor-

tant concepts such as software processes and agile methods, and describe

essential software development activities, from requirements specification

through to system evolution.

Chapter 1 is a general introduction that introduces professional software

engineering and defines some software engineering concepts. I have also

included a brief discussion of ethical issues in software engineering. It is

important for software engineers to think about the wider implications of

their work. This chapter also introduces four case studies that I use in the

book. These are an information system for managing records of patients

undergoing treatment for mental health problems (Mentcare), a control

system for a portable insulin pump, an embedded system for a wilder-

ness weather station and a digital learning environment (iLearn).

Chapters 2 and 3 cover software engineering processes and agile devel-

opment. In Chapter 2, I introduce software process models, such as the

waterfall model, and I discuss the basic activities that are part of these

processes. Chapter 3 supplements this with a discussion of agile devel-

opment methods for software engineering. This chapter had been

extensively changed from previous editions with a focus on agile

development using Scrum and a discussion of agile practices such as

stories

for requirements definition and test-driven development.

The remaining chapters in this part are extended descriptions of the

software process activities that are introduced in Chapter 2. Chapter 4

covers the critically important topic of requirements engineering, where

the requirements for what a system should do are defined. Chapter 5

explains system modeling using the UML, where I focus on the use of

use case diagrams, class diagrams, sequence diagrams and state dia-

grams for modeling a software system. In Chapter 6, I discuss the impor-

tance of software architecture and the use of architectural patterns in

software design.

Chapter 7 introduces object oriented design and the use of design pat-

terns. I also introduce important implementation issues here—reuse,

configuration management and host-target development and discuss

open source development. Chapter 8 focuses on software testing from

unit testing during system development to the testing of software

releases. I also discuss the use of test-driven development—an

approach pioneered in agile methods but which has wide applicabil-

ity. Finally, Chapter 9 presents an overview of software evolution

issues. I cover evolution processes, software maintenance and legacy

system management.

1

Introduction

Objectives

The objectives of this chapter are to introduce software engineering and

to provide a framework for understanding the rest of the book. When you

have read this chapter, you will:

understand what software engineering is and why it is important;

understand that the development of different types of software

system may require different software engineering techniques;

understand ethical and professional issues that are important

for software engineers;

have been introduced to four systems, of different types, which are

used as examples throughout the book.

Contents

1.1 Professional software development

1.2 Software engineering ethics

1.3 Case studies

18 Chapter 1 Introduction

Software engineering is essential for the functioning of government,

society, and national and international businesses and institutions. We

can’t run the modern world without software. National infrastructures and

utilities are controlled by computer-based systems, and most electrical

products include a computer and controlling software. Industrial

manufacturing and distribution is completely computerized, as is the

financial system.

Entertainment, including the music industry, computer games, and film

and television, is software-intensive. More than 75% of the world’s

population have a software-controlled mobile phone, and, by 2016, almost

all of these will be Internet-enabled.

Software systems are abstract and intangible. They are not constrained by

the prop-

erties of materials, nor are they governed by physical laws or by

manufacturing pro-

cesses. This simplifies software engineering, as there are no natural limits

to the potential of software. However, because of the lack of physical

constraints, software systems can quickly become extremely complex,

difficult to understand, and expensive to change.

There are many different types of software system, ranging from simple

embed-

ded systems to complex, worldwide information systems. There are no

universal

notations, methods, or techniques for software engineering because

different types

of software require different approaches. Developing an organizational

information

system is completely different from developing a controller for a scientific

instru-

ment. Neither of these systems has much in common with a graphics-

intensive com-

puter game. All of these applications need software engineering; they do

not all need the same software engineering methods and techniques.

There are still many reports of software projects going wrong and of

“software

failures.” Software engineering is criticized as inadequate for modern

software

development. However, in my opinion, many of these so-called software

failures

are a consequence of two factors:

1. Increasing system complexity As new software engineering techniques

help us to build larger, more complex systems, the demands change.

Systems have to be

built and delivered more quickly; larger, even more complex systems are

required; and systems have to have new capabilities that were previously

thought to be impossible. New software engineering techniques have to be

developed to meet new the challenges of delivering more complex

software.

2. Failure to use software engineering methods It is fairly easy to write

computer programs without using software engineering methods and

techniques. Many

companies have drifted into software development as their products and

ser-

vices have evolved. They do not use software engineering methods in their

every-

day work. Consequently, their software is often more expensive and less

reliable

than it should be. We need better software engineering education and

training to

address this problem.

Software engineers can be rightly proud of their achievements. Of course,

we still

have problems developing complex software, but without software

engineering we

would not have explored space and we would not have the Internet or

modern tele-

communications. All forms of travel would be more dangerous and

expensive.

Challenges for humanity in the 21st century are climate change, fewer

natural

1.1 Professional software development 19

History of software engineering

The notion of software engineering was first proposed in 1968 at a

conference held to discuss what was then called the software crisis (Naur

and Randell 1969). It became clear that individual approaches to program

development did not scale up to large and complex software systems.

These were unreliable, cost more than expected, and were delivered late.

Throughout the 1970s and 1980s, a variety of new software engineering

techniques and methods were developed, such as structured programming,

information hiding, and object-oriented development. Tools and standard

notations were developed which are the basis of today’s software

engineering.

http://software-engineering-book.com/web/history/

resources, changing demographics, and an expanding world population.

We will rely

on software engineering to develop the systems that we need to cope with

these issues.

1.1 Professional software development

Lots of people write programs. People in business write spreadsheet

programs to

simplify their jobs; scientists and engineers write programs to process their

experimental data; hobbyists write programs for their own interest and

enjoyment.

However, most software development is a professional activity in which

software is

developed for business purposes, for inclusion in other devices, or as

software products such as information systems and computer-aided design

systems. The key dis-

tinctions are that professional software is intended for use by someone

apart from its developer and that teams rather than individuals usually

develop the software. It is maintained and changed throughout its life.

Software engineering is intended to support professional software

development

rather than individual programming. It includes techniques that support

program

specification, design, and evolution, none of which are normally relevant

for per-

sonal software development. To help you to get a broad view of software

engineer-

ing, I have summarized frequently asked questions about the subject in

Figure 1.1.

Many people think that software is simply another word for computer

programs.

However, when we are talking about software engineering, software is not

just the

programs themselves but also all associated documentation, libraries,

support web-

sites, and configuration data that are needed to make these programs

useful. A pro-

fessionally developed software system is often more than a single

program. A system

may consist of several separate programs and configuration files that are

used to set up these programs. It may include system documentation,

which describes the structure of the system, user documentation, which

explains how to use the system, and

websites for users to download recent product information.

This is one of the important differences between professional and amateur

soft-

ware development. If you are writing a program for yourself, no one else

will use it

20 Chapter 1 Introduction

Question

Answer

What is software?

Computer programs and associated documentation. Software

products may be developed for a particular customer or may be

developed for a general market.

What are the attributes of good

Good software should deliver the required functionality and

software?

performance to the user and should be maintainable, dependable

and usable.

What is software engineering?

Software engineering is an engineering discipline that is concerned

with all aspects of software production from initial conception to

operation and maintenance.

What are the fundamental

Software specification, software development, software validation

software engineering activities?

and software evolution.

What is the difference between

Computer science focuses on theory and fundamentals; software

software engineering and

engineering is concerned with the practicalities of developing and

computer science?

delivering useful software.

What is the difference between

System engineering is concerned with all aspects of computer-

software engineering and system

based systems development including hardware, software and

engineering?

process engineering. Software engineering is part of this more

general process.

What are the key challenges

Coping with increasing diversity, demands for reduced delivery

facing software engineering?

times and developing trustworthy software.

What are the costs of software

Roughly 60% of software costs are development costs, 40% are

engineering?

testing costs. For custom software, evolution costs often exceed

development costs.

What are the best software

While all software projects have to be professionally managed and

engineering techniques and

developed, different techniques are appropriate for different types

methods?

of system. For example, games should always be developed using

a series of prototypes whereas safety critical control systems

require a complete and analyzable specification to be developed.

There are no methods and techniques that are good for everything.

What differences has the Internet

Not only has the Internet led to the development of massive, highly

made to software engineering?

distributed, service-based systems, it has also supported the

creation of an “app” industry for mobile devices which has

changed the economics of software.

Figure 1.1 Frequently

asked questions about and you don’t have to worry about writing program

guides, documenting the pro-software engineering

gram design, and so on. However, if you are writing software that other

people will

use and other engineers will change, then you usually have to provide

additional

information as well as the code of the program.

Software engineers are concerned with developing software products, that

is,

software that can be sold to a customer. There are two kinds of software

product:

1. Generic products These are stand-alone systems that are produced by a

development organization and sold on the open market to any customer

who is

able to buy them. Examples of this type of product include apps for mobile

devices, software for PCs such as databases, word processors, drawing

packages,

and project management tools. This kind of software also includes

“vertical”

1.1 Professional software development 21

applications designed for a specific market such as library information

systems,

accounting systems, or systems for maintaining dental records.

2. Customized (or bespoke) software These are systems that are

commissioned by and developed for a particular customer. A software

contractor designs and

implements the software especially for that customer. Examples of this

type of

software include control systems for electronic devices, systems written to

support a particular business process, and air traffic control systems.

The critical distinction between these types of software is that, in generic

prod-

ucts, the organization that develops the software controls the software

specification.

This means that if they run into development problems, they can rethink

what is to

be developed. For custom products, the specification is developed and

controlled by

the organization that is buying the software. The software developers must

work to

that specification.

However, the distinction between these system product types is becoming

increas-

ingly blurred. More and more systems are now being built with a generic

product as

a base, which is then adapted to suit the requirements of a customer.

Enterprise

Resource Planning (ERP) systems, such as systems from SAP and Oracle,

are the

best examples of this approach. Here, a large and complex system is

adapted for a

company by incorporating information about business rules and processes,

reports

required, and so on.

When we talk about the quality of professional software, we have to

consider that

the software is used and changed by people apart from its developers.

Quality is

therefore not just concerned with what the software does. Rather, it has to

include the software’s behavior while it is executing and the structure and

organization of the system programs and associated documentation. This

is reflected in the software’s qual-

ity or non-functional attributes. Examples of these attributes are the

software’s

response time to a user query and the understandability of the program

code.

The specific set of attributes that you might expect from a software system

obvi-

ously depends on its application. Therefore, an aircraft control system

must be safe, an interactive game must be responsive, a telephone

switching system must be reliable,

and so on. These can be generalized into the set of attributes shown in

Figure 1.2,

which I think are the essential characteristics of a professional software

system.

1.1.1 Software engineering

Software engineering is an engineering discipline that is concerned with

all aspects of software production from the early stages of system

specification through to

maintaining the system after it has gone into use. In this definition, there

are two key phrases:

1. Engineering discipline Engineers make things work. They apply theories,

methods, and tools where these are appropriate. However, they use them

selectively

22 Chapter 1 Introduction

Product characteristic

Description

Acceptability

Software must be acceptable to the type of users for which it is

designed. This means that it must be understandable, usable, and

compatible with other systems that they use.

Dependability and security

Software dependability includes a range of characteristics including

reliability, security, and safety. Dependable software should not

cause physical or economic damage in the event of system failure.

Software has to be secure so that malicious users cannot access or

damage the system.

Efficiency

Software should not make wasteful use of system resources such

as memory and processor cycles. Efficiency therefore includes

responsiveness, processing time, resource utilization, etc.

Maintainability

Software should be written in such a way that it can evolve to

meet the changing needs of customers. This is a critical attribute

because software change is an inevitable requirement of a

changing business environment.

Figure 1.2 Essential

attributes of good

and always try to discover solutions to problems even when there are no

appli-

software

cable theories and methods. Engineers also recognize that they must work

within organizational and financial constraints, and they must look for

solutions

within these constraints.

2. All aspects of software production Software engineering is not just

concerned with the technical processes of software development. It also

includes activities

such as software project management and the development of tools,

methods,

and theories to support software development.

Engineering is about getting results of the required quality within schedule

and

budget. This often involves making compromises—engineers cannot be

perfection-

ists. People writing programs for themselves, however, can spend as much

time as

they wish on the program development.

In general, software engineers adopt a systematic and organized approach

to their

work, as this is often the most effective way to produce high-quality

software.

However, engineering is all about selecting the most appropriate method

for a set of circumstances, so a more creative, less formal approach to

development may be the

right one for some kinds of software. A more flexible software process that

accom-

modates rapid change is particularly appropriate for the development of

interactive

web-based systems and mobile apps, which require a blend of software

and graphi-

cal design skills.

Software engineering is important for two reasons:

1. More and more, individuals and society rely on advanced software

systems. We need to be able to produce reliable and trustworthy systems

economically and quickly.

2. It is usually cheaper, in the long run, to use software engineering

methods and

techniques for professional software systems rather than just write

programs as

1.1 Professional software development 23

a personal programming project. Failure to use software engineering

method

leads to higher costs for testing, quality assurance, and long-term

maintenance.

The systematic approach that is used in software engineering is sometimes

called

a software process. A software process is a sequence of activities that leads

to the production of a software product. Four fundamental activities are

common to all

software processes.

1. Software specification, where customers and engineers define the

software that

is to be produced and the constraints on its operation.

2. Software development, where the software is designed and

programmed.

3. Software validation, where the software is checked to ensure that it is

what the customer requires.

4. Software evolution, where the software is modified to reflect changing

customer

and market requirements.

Different types of systems need different development processes, as I

explain in

Chapter 2. For example, real-time software in an aircraft has to be

completely specified before development begins. In e-commerce systems,

the specification and the

program are usually developed together. Consequently, these generic

activities may

be organized in different ways and described at different levels of detail,

depending on the type of software being developed.

Software engineering is related to both computer science and systems

engineering.

1. Computer science is concerned with the theories and methods that

underlie

computers and software systems, whereas software engineering is

concerned

with the practical problems of producing software. Some knowledge of

com-

puter science is essential for software engineers in the same way that some

knowledge of physics is essential for electrical engineers. Computer

science

theory, however, is often most applicable to relatively small programs.

Elegant

theories of computer science are rarely relevant to large, complex

problems that

require a software solution.

2. System engineering is concerned with all aspects of the development

and evolu-

tion of complex systems where software plays a major role. System

engineering

is therefore concerned with hardware development, policy and process

design,

and system deployment, as well as software engineering. System engineers

are

involved in specifying the system, defining its overall architecture, and

then

integrating the different parts to create the finished system.

As I discuss in the next section, there are many different types of software.

There are no universal software engineering methods or techniques that

may be used. However,

there are four related issues that affect many different types of software:

24 Chapter 1 Introduction

1. Heterogeneity Increasingly, systems are required to operate as

distributed systems across networks that include different types of

computer and mobile

devices. As well as running on general-purpose computers, software may

also

have to execute on mobile phones and tablets. You often have to integrate

new

software with older legacy systems written in different programming

languages.

The challenge here is to develop techniques for building dependable

software

that is flexible enough to cope with this heterogeneity.

2. Business and social change Businesses and society are changing

incredibly quickly as emerging economies develop and new technologies

become available. They need to be able to change their existing software

and to rapidly

develop new software. Many traditional software engineering techniques

are

time consuming, and delivery of new systems often takes longer than

planned.

They need to evolve so that the time required for software to deliver value

to its

customers is reduced.

3. Security and trust As software is intertwined with all aspects of our lives,

it is essential that we can trust that software. This is especially true for

remote software systems accessed through a web page or web service

interface. We have to

make sure that malicious users cannot successfully attack our software and

that

information security is maintained.

4. Scale Software has to be developed across a very wide range of scales,

from very small embedded systems in portable or wearable devices

through to

Internet-scale, cloud-based systems that serve a global community.

To address these challenges, we will need new tools and techniques as

well as

innovative ways of combining and using existing software engineering

methods.

1.1.2 Software engineering diversity

Software engineering is a systematic approach to the production of

software

that takes into account practical cost, schedule, and dependability issues,

as

well as the needs of software customers and producers. The specific

methods,

tools, and techniques used depend on the organization developing the

software,

the type of software, and the people involved in the development process.

There

are no universal software engineering methods that are suitable for all

systems

and all companies. Rather, a diverse set of software engineering methods

and

tools has evolved over the past 50 years. However, the SEMAT initiative

(Jacobson et al. 2013) proposes that there can be a fundamental meta-

process

that can be instantiated to create different kinds of process. This is at an

early

stage of development and may be a basis for improving our current

software

engineering methods.

Perhaps the most significant factor in determining which software

engineering

methods and techniques are most important is the type of application

being devel-

oped. There are many different types of application, including:

1.1 Professional software development 25

1. Stand-alone applications These are application systems that run on a

personal computer or apps that run on a mobile device. They include all

necessary functionality and may not need to be connected to a network.

Examples of such

applications are office applications on a PC, CAD programs, photo

manipula-

tion software, travel apps, productivity apps, and so on.

2. Interactive transaction-based applications These are applications that

execute on a remote computer and that are accessed by users from their

own computers,

phones, or tablets. Obviously, these include web applications such as e-

commerce

applications where you interact with a remote system to buy goods and

services.

This class of application also includes business systems, where a business

provides access to its systems through a web browser or special-purpose

client

program and cloud-based services, such as mail and photo sharing.

Interactive

applications often incorporate a large data store that is accessed and

updated in

each transaction.

3. Embedded control systems These are software control systems that control

and manage hardware devices. Numerically, there are probably more

embedded systems than any other type of system. Examples of embedded

systems include the

software in a mobile (cell) phone, software that controls antilock braking

in a

car, and software in a microwave oven to control the cooking process.

4. Batch processing systems These are business systems that are designed to

process data in large batches. They process large numbers of individual

inputs to

create corresponding outputs. Examples of batch systems are periodic

billing

systems, such as phone billing systems, and salary payment systems.

5. Entertainment systems These are systems for personal use that are

intended to entertain the user. Most of these systems are games of one

kind or another,

which may run on special-purpose console hardware. The quality of the

user

interaction offered is the most important distinguishing characteristic of

enter-

tainment systems.

6. Systems for modeling and simulation These are systems that are developed

by scientists and engineers to model physical processes or situations,

which include

many separate, interacting objects. These are often computationally

intensive

and require high-performance parallel systems for execution.

7. Data collection and analysis systems Data collection systems are systems

that collect data from their environment and send that data to other

systems for processing. The software may have to interact with sensors

and often is installed in

a hostile environment such as inside an engine or in a remote location.

“Big

data” analysis may involve cloud-based systems carrying out statistical

analysis

and looking for relationships in the collected data.

8. Systems of systems These are systems, used in enterprises and other large

organizations, that are composed of a number of other software systems.

Some of

these may be generic software products, such as an ERP system. Other

systems

in the assembly may be specially written for that environment.

26 Chapter 1 Introduction

Of course, the boundaries between these system types are blurred. If you

develop

a game for a phone, you have to take into account the same constraints

(power, hard-

ware interaction) as the developers of the phone software. Batch

processing systems

are often used in conjunction with web-based transaction systems. For

example, in a

company, travel expense claims may be submitted through a web

application but

processed in a batch application for monthly payment.

Each type of system requires specialized software engineering techniques

because

the software has different characteristics. For example, an embedded

control system

in an automobile is safety-critical and is burned into ROM (read-only

memory)

when installed in the vehicle. It is therefore very expensive to change.

Such a system needs extensive verification and validation so that the

chances of having to recall

cars after sale to fix software problems are minimized. User interaction is

minimal

(or perhaps nonexistent), so there is no need to use a development process

that relies on user interface prototyping.

For an interactive web-based system or app, iterative development and

delivery is

the best approach, with the system being composed of reusable

components.

However, such an approach may be impractical for a system of systems,

where

detailed specifications of the system interactions have to be specified in

advance so that each system can be separately developed.

Nevertheless, there are software engineering fundamentals that apply to

all types

of software systems:

1. They should be developed using a managed and understood

development pro-

cess. The organization developing the software should plan the

development

process and have clear ideas of what will be produced and when it will be

com-

pleted. Of course, the specific process that you should use depends on the

type

of software that you are developing.

2. Dependability and performance are important for all types of system.

Software

should behave as expected, without failures, and should be available for

use

when it is required. It should be safe in its operation and, as far as

possible,

should be secure against external attack. The system should perform

efficiently

and should not waste resources.

3. Understanding and managing the software specification and

requirements (what

the software should do) are important. You have to know what different

custom-

ers and users of the system expect from it, and you have to manage their

expec-

tations so that a useful system can be delivered within budget and to

schedule.

4. You should make effective use of existing resources. This means that,

where

appropriate, you should reuse software that has already been developed

rather

than write new software.

These fundamental notions of process, dependability, requirements,

manage-

ment, and reuse are important themes of this book. Different methods

reflect them in different ways, but they underlie all professional software

development.

1.1 Professional software development 27

These fundamentals are independent of the program language used for

software

development. I don’t cover specific programming techniques in this book

because

these vary dramatically from one type of system to another. For example,

a dynamic

language, such as Ruby, is the right type of language for interactive

system development but is inappropriate for embedded systems

engineering.

1.1.3 Internet software engineering

The development of the Internet and the World Wide Web has had a

profound

effect on all of our lives. Initially, the web was primarily a universally

accessible information store, and it had little effect on software systems.

These systems ran

on local computers and were only accessible from within an organization.

Around

2000, the web started to evolve, and more and more functionality was

added to

browsers. This meant that web-based systems could be developed where,

instead

of a special-purpose user interface, these systems could be accessed using

a web

browser. This led to the development of a vast range of new system

products that

delivered innovative services, accessed over the web. These are often

funded by

adverts that are displayed on the user’s screen and do not involve direct

payment

from users.

As well as these system products, the development of web browsers that

could

run small programs and do some local processing led to an evolution in

business and

organizational software. Instead of writing software and deploying it on

users’ PCs, the software was deployed on a web server. This made it much

cheaper to change

and upgrade the software, as there was no need to install the software on

every PC.

It also reduced costs, as user interface development is particularly

expensive.

Wherever it has been possible to do so, businesses have moved to web-

based inter-

action with company software systems.

The notion of software as a service (Chapter 17) was proposed early in the

21st

century This has now become the standard approach to the delivery of

web-based

system products such as Google Apps, Microsoft Office 365, and Adobe

Creative

Suite. More and more software runs on remote “clouds” instead of local

servers and

is accessed over the Internet. A computing cloud is a huge number of

linked com-

puter systems that is shared by many users. Users do not buy software but

pay

according to how much the software is used or are given free access in

return for

watching adverts that are displayed on their screen. If you use services

such as web-based mail, storage, or video, you are using a cloud-based

system.

The advent of the web has led to a dramatic change in the way that

business soft-

ware is organized. Before the web, business applications were mostly

monolithic,

single programs running on single computers or computer clusters.

Communications

were local, within an organization. Now, software is highly distributed,

sometimes

across the world. Business applications are not programmed from scratch

but involve

extensive reuse of components and programs.

This change in software organization has had a major effect on software

engi-

neering for web-based systems. For example:

28 Chapter 1 Introduction

1. Software reuse has become the dominant approach for constructing

web-based

systems. When building these systems, you think about how you can

assemble

them from preexisting software components and systems, often bundled

together

in a framework.

2. It is now generally recognized that it is impractical to specify all the

requirements for such systems in advance. Web-based systems are always

developed

and delivered incrementally.

3. Software may be implemented using service-oriented software

engineering,

where the software components are stand-alone web services. I discuss

this

approach to software engineering in Chapter 18.

4. Interface development technology such as AJAX (Holdener 2008) and

HTML5

(Freeman 2011) have emerged that support the creation of rich interfaces

within

a web browser.

The fundamental ideas of software engineering, discussed in the previous

section,

apply to web-based software, as they do to other types of software. Web-

based sys-

tems are getting larger and larger, so software engineering techniques that

deal with scale and complexity are relevant for these systems.

1.2 Software engineering ethics

Like other engineering disciplines, software engineering is carried out

within a

social and legal framework that limits the freedom of people working in

that area. As a software engineer, you must accept that your job involves

wider responsibilities

than simply the application of technical skills. You must also behave in an

ethical

and morally responsible way if you are to be respected as a professional

engineer.

It goes without saying that you should uphold normal standards of

honesty and

integrity. You should not use your skills and abilities to behave in a

dishonest way or in a way that will bring disrepute to the software

engineering profession. However,

there are areas where standards of acceptable behavior are not bound by

laws but by

the more tenuous notion of professional responsibility. Some of these are:

1. Confidentiality You should normally respect the confidentiality of your

employers or clients regardless of whether or not a formal confidentiality

agreement

has been signed.

2. Competence You should not misrepresent your level of competence. You

should not knowingly accept work that is outside your competence.

3. Intellectual property rights You should be aware of local laws governing

the use of intellectual property such as patents and copyright. You should

be careful

to ensure that the intellectual property of employers and clients is

protected.

1.2 Software engineering ethics 29

Software Engineering Code of Ethics and Professional Practice

ACM/IEEE-CS Joint Task Force on Software Engineering Ethics and

Professional Practices PREAMBLE

The short version of the code summarizes aspirations at a high level of the

abstraction; the clauses that are included in the full version give examples

and details of how these aspirations change the way we act as software

engineering professionals. Without the aspirations, the details can become

legalistic and tedious; without the details, the aspirations can become high

sounding but empty; together, the aspirations and the details form a

cohesive code.

Software engineers shall commit themselves to making the analysis,

specification, design, development, testing, and maintenance of software a

beneficial and respected profession. In accordance with their commitment

to the health, safety, and welfare of the public, software engineers shall

adhere to the following Eight Principles: 1. PUBLIC — Software

engineers shall act consistently with the public interest.

2. CLIENT AND EMPLOYER — Software engineers shall act in a

manner that is in the

best interests of their client and employer consistent with the public

interest.

3. PRODUCT — Software engineers shall ensure that their products

and related

modifications meet the highest professional standards possible.

4. JUDGMENT — Software engineers shall maintain integrity and

independence in their professional judgment.

5. MANAGEMENT — Software engineering managers and leaders

shall subscribe to and

promote an ethical approach to the management of software

development and

maintenance.

6. PROFESSION — Software engineers shall advance the integrity and

reputation of

the profession consistent with the public interest.

7. COLLEAGUES — Software engineers shall be fair to and supportive

of their

colleagues.

8. SELF — Software engineers shall participate in lifelong learning

regarding

the practice of their profession and shall promote an ethical

approach to the

practice of the profession.

Figure 1.3 The ACM/

IEEE Code of Ethics

4. Computer misuse You should not use your technical skills to misuse other

peo-

(ACM/IEEE-CS Joint

Task Force on Software

ple’s computers. Computer misuse ranges from relatively trivial (game

playing

Engineering Ethics and

on an employer’s machine) to extremely serious (dissemination of viruses

or

Professional Practices,

other malware).

short version. http://

www.acm.org/about/

se-code)

Professional societies and institutions have an important role to play in

setting

(© 1999 by the ACM,

ethical standards. Organizations such as the ACM, the IEEE (Institute of

Electrical

Inc. and the IEEE, Inc.) and Electronic Engineers), and the British

Computer Society publish a code of professional conduct or code of ethics.

Members of these organizations undertake to

follow that code when they sign up for membership. These codes of

conduct are

generally concerned with fundamental ethical behavior.

Professional associations, notably the ACM and the IEEE, have cooperated

to

produce a joint code of ethics and professional practice. This code exists in

both a short form, shown in Figure 1.3, and a longer form (Gotterbarn,

Miller, and Rogerson 1999) that adds detail and substance to the shorter

version. The rationale behind this code is summarized in the first two

paragraphs of the longer form:

30 Chapter 1 Introduction

Computers have a central and growing role in commerce, industry, government,

medicine, education, entertainment and society at large. Software engineers are

those who contribute by direct participation or by teaching, to the analysis,

specification, design, development, certification, maintenance and testing of

software systems. Because of their roles in developing software systems,

software engineers have significant opportunities to do good or cause harm, to

enable others to do good or cause harm, or to influence others to do good or

cause harm. To

ensure, as much as possible, that their efforts will be used for good, software

engineers must commit themselves to making software engineering a beneficial

and respected profession. In accordance with that commitment, software engi-

neers shall adhere to the following Code of Ethics and Professional Practice .

The Code contains eight Principles related to the behaviour of and decisions

made by professional software engineers, including practitioners, educators,

managers, supervisors and policy makers, as well as trainees and students of

the profession. The Principles identify the ethically responsible relationships

in which individuals, groups, and organizations participate and the primary

obligations within these relationships. The Clauses of each Principle are illus-

trations of some of the obligations included in these relationships. These obli-

gations are founded in the software engineer’s humanity, in special care owed

to people affected by the work of software engineers, and the unique elements

of the practice of software engineering. The Code prescribes these as obliga-

tions of anyone claiming to be or aspiring to be a software engineer .

In any situation where different people have different views and

objectives, you are likely to be faced with ethical dilemmas. For example,

if you disagree, in principle, with the policies of more senior management

in the company, how should you react? Clearly, this depends on the

people involved and the nature of the disagreement. Is it best to argue a

case for your position from within the organization or to resign in

principle? If you feel that there are problems with a software project,

when do you reveal these problems to management? If you discuss these

while they are just a suspicion, you may be overreact-ing to a situation; if

you leave it too long, it may be impossible to resolve the difficulties.

We all face such ethical dilemmas in our professional lives, and,

fortunately, in

most cases they are either relatively minor or can be resolved without too

much dif-

ficulty. Where they cannot be resolved, the engineer is faced with,

perhaps, another problem. The principled action may be to resign from

their job, but this may well

affect others such as their partner or their children.

A difficult situation for professional engineers arises when their employer

acts in

an unethical way. Say a company is responsible for developing a safety-

critical

system and, because of time pressure, falsifies the safety validation

records. Is the engineer’s responsibility to maintain confidentiality or to

alert the customer or

publicize, in some way, that the delivered system may be unsafe?

†ACM/IEEE-CS Joint Task Force on Software Engineering Ethics and

Professional Practices, short version Preamble. http://www.acm.org/

about/se-code Copyright © 1999 by the Association for Computing

Machinery, Inc. and the Institute for Electrical and Electronics Engineers,

Inc.

1.3 Case studies 31

The problem here is that there are no absolutes when it comes to safety.

Although

the system may not have been validated according to predefined criteria,

these

criteria may be too strict. The system may actually operate safely

throughout its lifetime. It is also the case that, even when properly

validated, the system may fail and cause an accident. Early disclosure of

problems may result in damage to the employer and other employees;

failure to disclose problems may result in damage to others.

You must make up your own mind in these matters. The appropriate

ethical posi-

tion here depends on the views of the people involved. The potential for

damage, the extent of the damage, and the people affected by the damage

should influence the

decision. If the situation is very dangerous, it may be justified to publicize

it using the national press or social media. However, you should always

try to resolve the

situation while respecting the rights of your employer.

Another ethical issue is participation in the development of military and

nuclear

systems. Some people feel strongly about these issues and do not wish to

participate in any systems development associated with defense systems.

Others will work on military systems but not on weapons systems. Yet

others feel that national security is an overriding principle and have no

ethical objections to working on weapons systems.

In this situation, it is important that both employers and employees should

make

their views known to each other in advance. Where an organization is

involved in

military or nuclear work, it should be able to specify that employees must

be willing to accept any work assignment. Equally, if an employee is taken

on and makes clear

that he or she does not wish to work on such systems, employers should

not exert

pressure to do so at some later date.

The general area of ethics and professional responsibility is increasingly

important as software-intensive systems pervade every aspect of work and

everyday life. It can be considered from a philosophical standpoint where

the basic principles of ethics are considered and software engineering

ethics are discussed with reference to these

basic principles. This is the approach taken by Laudon (Laudon 1995) and

Johnson

(Johnson 2001). More recent texts such as that by Tavani (Tavani 2013)

introduce the notion of cyberethics and cover both the philosophical

background and practical and

legal issues. They include ethical issues for technology users as well as

developers.

I find that a philosophical approach is too abstract and difficult to relate to

everyday experience so I prefer the more concrete approach embodied in

professional

codes of conduct (Bott 2005; Duquenoy 2007). I think that ethics are best

discussed

in a software engineering context and not as a subject in its own right.

Therefore, I do not discuss software engineering ethics in an abstract way

but include examples

in the exercises that can be the starting point for a group discussion.

1.3 Case studies

To illustrate software engineering concepts, I use examples from four

different types of system. I have deliberately not used a single case study,

as one of the key messages in this book is that software engineering

practice depends on the type of systems

32 Chapter 1 Introduction

being produced. I therefore choose an appropriate example when

discussing con-

cepts such as safety and dependability, system modeling, reuse, etc.

The system types that I use as case studies are:

1. An embedded system This is a system where the software controls some

hardware device and is embedded in that device. Issues in embedded

systems typi-

cally include physical size, responsiveness, and power management, etc.

The

example of an embedded system that I use is a software system to control

an

insulin pump for people who have diabetes.

2. An information system The primary purpose of this type of system is to

manage and provide access to a database of information. Issues in

information systems

include security, usability, privacy, and maintaining data integrity. The

example

of an information system used is a medical records system.

3. A sensor-based data collection system This is a system whose primary

purposes are to collect data from a set of sensors and to process that data

in some way.

The key requirements of such systems are reliability, even in hostile

environ-

mental conditions, and maintainability. The example of a data collection

system

that I use is a wilderness weather station.

4. A support environment. This is an integrated collection of software tools

that are used to support some kind of activity. Programming

environments, such as

Eclipse (Vogel 2012) will be the most familiar type of environment for

readers

of this book. I describe an example here of a digital learning environment

that

is used to support students’ learning in schools.

I introduce each of these systems in this chapter; more information about

each of

them is available on the website (software-engineering-book.com).

1.3.1 An insulin pump control system

An insulin pump is a medical system that simulates the operation of the

pancreas (an internal organ). The software controlling this system is an

embedded system that

collects information from a sensor and controls a pump that delivers a

controlled

dose of insulin to a user.

People who suffer from diabetes use the system. Diabetes is a relatively

common

condition in which the human pancreas is unable to produce sufficient

quantities of

a hormone called insulin. Insulin metabolizes glucose (sugar) in the blood.

The con-

ventional treatment of diabetes involves regular injections of genetically

engineered insulin. Diabetics measure their blood sugar levels periodically

using an external

meter and then estimate the dose of insulin they should inject.

The problem is that the level of insulin required does not just depend on

the blood

glucose level but also on the time of the last insulin injection. Irregular

checking can lead to very low levels of blood glucose (if there is too much

insulin) or very high levels of blood sugar (if there is too little insulin).

Low blood glucose is, in the short term, a more serious condition as it can

result in temporary brain malfunctioning and,

1.3 Case studies 33

Insulin reservoir

Needle

assembly

Pump

Clock

Sensor

Controller

Alarm

Display1

Display2

Figure 1.4 Insulin pump

Power supply

hardware architecture

Blood

Analyze sensor

Blood

Compute

Insulin

sensor

reading

sugar

insulin

log

Insulin

dose

Insulin

Control insulin

Pump

Compute pump

Log dose

pump

pump

data

commands

ultimately, unconsciousness and death. In the long term, however,

continual high

Figure 1.5 Activity

levels of blood glucose can lead to eye damage, kidney damage, and heart

problems.

model of the

insulin pump

Advances in developing miniaturized sensors have meant that it is now

possible

to develop automated insulin delivery systems. These systems monitor

blood sugar

levels and deliver an appropriate dose of insulin when required. Insulin

delivery

systems like this one are now available and are used by patients who find

it difficult to control their insulin levels. In future, it may be possible for

diabetics to have such systems permanently attached to their bodies.

A software-controlled insulin delivery system uses a microsensor

embedded in

the patient to measure some blood parameter that is proportional to the

sugar level.

This is then sent to the pump controller. This controller computes the

sugar level and the amount of insulin that is needed. It then sends signals

to a miniaturized pump to deliver the insulin via a permanently attached

needle.

Figure 1.4 shows the hardware components and organization of the

insulin pump.

To understand the examples in this book, all you need to know is that the

blood sensor measures the electrical conductivity of the blood under

different conditions and that these values can be related to the blood sugar

level. The insulin pump delivers one unit of insulin in response to a single

pulse from a controller. Therefore, to deliver 10 units of insulin, the

controller sends 10 pulses to the pump. Figure 1.5 is a Unified Modeling

34 Chapter 1 Introduction

Mentcare

Mentcare

Mentcare

client

client

client

Network

Mentcare server

Patient database

Figure 1.6 The

organization of the

Mentcare system

Language (UML) activity model that illustrates how the software

transforms an input

blood sugar level to a sequence of commands that drive the insulin pump.

Clearly, this is a safety-critical system. If the pump fails to operate or does

not

operate correctly, then the user’s health may be damaged or they may fall

into a

coma because their blood sugar levels are too high or too low. This system

must

therefore meet two essential high-level requirements:

1. The system shall be available to deliver insulin when required.

2. The system shall perform reliably and deliver the correct amount of

insulin to

counteract the current level of blood sugar.

The system must therefore be designed and implemented to ensure that it

always

meets these requirements. More detailed requirements and discussions of

how to

ensure that the system is safe are discussed in later chapters.

1.3.2 A patient information system for mental health care

A patient information system to support mental health care (the Mentcare

system) is a medical information system that maintains information about

patients suffering from

mental health problems and the treatments that they have received. Most

mental

health patients do not require dedicated hospital treatment but need to

attend specialist clinics regularly where they can meet a doctor who has

detailed knowledge of their problems. To make it easier for patients to

attend, these clinics are not just run in hospitals. They may also be held in

local medical practices or community centers.

The Mentcare system (Figure 1.6) is a patient information system that is

intended

for use in clinics. It makes use of a centralized database of patient

information but

1.3 Case studies 35

has also been designed to run on a laptop, so that it may be accessed and

used from

sites that do not have secure network connectivity. When the local systems

have

secure network access, they use patient information in the database, but

they can

download and use local copies of patient records when they are

disconnected. The

system is not a complete medical records system and so does not maintain

informa-

tion about other medical conditions. However, it may interact and

exchange data

with other clinical information systems.

This system has two purposes:

1. To generate management information that allows health service

managers to

assess performance against local and government targets.

2. To provide medical staff with timely information to support the

treatment of

patients.

Patients who suffer from mental health problems are sometimes irrational

and

disorganized so may miss appointments, deliberately or accidentally lose

prescriptions and medication, forget instructions and make unreasonable

demands on medical

staff. They may drop in on clinics unexpectedly. In a minority of cases,

they may be a danger to themselves or to other people. They may

regularly change address or

may be homeless on a long-term or short-term basis. Where patients are

dangerous,

they may need to be “sectioned”—that is, confined to a secure hospital for

treatment and observation.

Users of the system include clinical staff such as doctors, nurses, and

health visi-

tors (nurses who visit people at home to check on their treatment).

Nonmedical users include receptionists who make appointments, medical

records staff who maintain

the records system, and administrative staff who generate reports.

The system is used to record information about patients (name, address,

age, next

of kin, etc.), consultations (date, doctor seen, subjective impressions of the

patient, etc.), conditions, and treatments. Reports are generated at regular

intervals for medical staff and health authority managers. Typically,

reports for medical staff focus on information about individual patients,

whereas management reports are anonymized

and are concerned with conditions, costs of treatment, etc.

The key features of the system are:

1. Individual care management Clinicians can create records for patients,

edit the information in the system, view patient history, and so on. The

system supports

data summaries so that doctors who have not previously met a patient can

quickly learn about the key problems and treatments that have been

prescribed.

2. Patient monitoring The system regularly monitors the records of patients

that are involved in treatment and issues warnings if possible problems are

detected.

Therefore, if a patient has not seen a doctor for some time, a warning may

be

issued. One of the most important elements of the monitoring system is to

keep

track of patients who have been sectioned and to ensure that the legally

required

checks are carried out at the right time.

36 Chapter 1 Introduction

3. Administrative reporting The system generates monthly management

reports showing the number of patients treated at each clinic, the number

of patients

who have entered and left the care system, the number of patients

sectioned, the

drugs prescribed and their costs, etc.

Two different laws affect the system: laws on data protection that govern

the con-

fidentiality of personal information and mental health laws that govern

the compul-

sory detention of patients deemed to be a danger to themselves or others.

Mental

health is unique in this respect as it is the only medical speciality that can

recommend the detention of patients against their will. This is subject to

strict legislative safeguards. One aim of the Mentcare system is to ensure

that staff always act in accord-

ance with the law and that their decisions are recorded for judicial review

if necessary.

As in all medical systems, privacy is a critical system requirement. It is

essential that patient information is confidential and is never disclosed to

anyone apart from authorized medical staff and the patient themselves.

The Mentcare system is also a

safety-critical system. Some mental illnesses cause patients to become

suicidal or a danger to other people. Wherever possible, the system should

warn medical staff

about potentially suicidal or dangerous patients.

The overall design of the system has to take into account privacy and

safety

requirements. The system must be available when needed; otherwise

safety may be

compromised, and it may be impossible to prescribe the correct

medication to patients.

There is a potential conflict here. Privacy is easiest to maintain when there

is only a single copy of the system data. However, to ensure availability in

the event of server failure or when disconnected from a network, multiple

copies of the data should be

maintained. I discuss the trade-offs between these requirements in later

chapters.

1.3.3 A wilderness weather station

To help monitor climate change and to improve the accuracy of weather

forecasts in

remote areas, the government of a country with large areas of wilderness

decides to

deploy several hundred weather stations in remote areas. These weather

stations col-

lect data from a set of instruments that measure temperature and pressure,

sunshine, rainfall, wind speed and wind direction.

Wilderness weather stations are part of a larger system (Figure 1.7), which

is a

weather information system that collects data from weather stations and

makes it

available to other systems for processing. The systems in Figure 1.7 are:

1. The weather station system This system is responsible for collecting

weather data, carrying out some initial data processing, and transmitting

it to the data

management system.

2. The data management and archiving system This system collects the data

from all of the wilderness weather stations, carries out data processing and

analysis,

and archives the data in a form that can be retrieved by other systems,

such as

weather forecasting systems.

1.3 Case studies 37

«system»

«system»

Weather station

Data management

and archiving

«system»

Figure 1.7 The weather

Station maintenance

station’s environment

3. The station maintenance system This system can communicate by satellite

with all wilderness weather stations to monitor the health of these systems

and provide reports of problems. It can update the embedded software in

these systems.

In the event of system problems, this system can also be used to remotely

con-

trol the weather station.

In Figure 1.7, I have used the UML package symbol to indicate that each

system is

a collection of components and the separate systems are identified using

the UML

stereotype «system». The associations between the packages indicate there

is an exchange of information but, at this stage, there is no need to define

them in any more detail.

The weather stations include instruments that measure weather

parameters such

as wind speed and direction, ground and air temperatures, barometric

pressure, and

rainfall over a 24-hour period. Each of these instruments is controlled by a

software system that takes parameter readings periodically and manages

the data collected

from the instruments.

The weather station system operates by collecting weather observations at

fre-

quent intervals; for example, temperatures are measured every minute.

However,

because the bandwidth to the satellite is relatively narrow, the weather

station carries out some local processing and aggregation of the data. It

then transmits this aggregated data when requested by the data collection

system. If it is impossible to make a connection, then the weather station

maintains the data locally until communication can be resumed.

Each weather station is battery-powered and must be entirely self-

contained; there

are no external power or network cables. All communications are through

a relatively slow satellite link, and the weather station must include some

mechanism (solar or

wind power) to charge its batteries. As they are deployed in wilderness

areas, they are exposed to severe environmental conditions and may be

damaged by animals. The

station software is therefore not just concerned with data collection. It

must also: 1. Monitor the instruments, power. and communication

hardware and report faults

to the management system.

2. Manage the system power, ensuring that batteries are charged

whenever the

environmental conditions permit but also that generators are shut down in

potentially damaging weather conditions, such as high wind.

38 Chapter 1 Introduction

3. Allow for dynamic reconfiguration where parts of the software are

replaced

with new versions and where backup instruments are switched into the

system

in the event of system failure.

Because weather stations have to be self-contained and unattended, this

means

that the software installed is complex, even though the data collection

functionality is fairly simple.

1.3.4 A digital learning environment for schools

Many teachers argue that using interactive software systems to support

education

can lead to both improved learner motivation and a deeper level of

knowledge and

understanding in students. However, there is no general agreement on the

‘best’

strategy for computer-supported learning, and teachers in practice use a

range of different interactive, web-based tools to support learning. The

tools used depend on the ages of the learners, their cultural background,

their experience with computers,

equipment available, and the preferences of the teachers involved.

A digital learning environment is a framework in which a set of general-

purpose

and specially designed tools for learning may be embedded, plus a set of

applica-

tions that are geared to the needs of the learners using the system. The

framework

provides general services such as an authentication service, synchronous

and asyn-

chronous communication services, and a storage service.

The tools included in each version of the environment are chosen by

teachers and

learners to suit their specific needs. These can be general applications such

as spreadsheets, learning management applications such as a Virtual

Learning Environment

(VLE) to manage homework submission and assessment, games, and

simulations.

They may also include specific content, such as content about the

American Civil

War and applications to view and annotate that content.

Figure 1.8 is a high-level architectural model of a digital learning

environment

(iLearn) that was designed for use in schools for students from 3 to 18

years of

age. The approach adopted is that this is a distributed system in which all

compo-

nents of the environment are services that can be accessed from anywhere

on the

Internet. There is no requirement that all of the learning tools are

gathered together in one place.

The system is a service-oriented system with all system components

considered

to be a replaceable service. There are three types of service in the system:

1. Utility services that provide basic application-independent functionality

and that may be used by other services in the system. Utility services are

usually

developed or adapted specifically for this system.

2. Application services that provide specific applications such as email,

conferencing, photo sharing, etc., and access to specific educational

content such as scientific

films or historical resources. Application services are external services that

are

either specifically purchased for the system or are available freely over the

Internet.

1.3 Case studies 39

Browser-based user interface

iLearn app

Configuration services

Group

Application

Identity

management

management

management

Application services

Email Messaging Video conferencing Newspaper archive

Word processing Simulation Video storage Resource finder

Spreadsheet Virtual learning environment History archive

Utility services

Figure 1.8 The

Authentication Logging and monitoring Interfacing

architecture of a

User storage

Application storage

Search

digital learning

environment (iLearn)

3. Configuration services that are used to adapt the environment with a

specific set of application services and to define how services are shared

between students,

teachers, and their parents.

The environment has been designed so that services can be replaced as

new ser-

vices become available and to provide different versions of the system that

are suited for the age of the users. This means that the system has to

support two levels of service integration:

1. Integrated services are services that offer an API (application

programming interface) and that can be accessed by other services

through that API. Direct

service-to-service communication is therefore possible. An authentication

ser-

vice is an example of an integrated service. Rather than use their own

authenti-

cation mechanisms, an authentication service may be called on by other

services

to authenticate users. If users are already authenticated, then the

authentication

service may pass authentication information directly to another service,

via an

API, with no need for users to reauthenticate themselves.

2. Independent services are services that are simply accessed through a

browser interface and that operate independently of other services.

Information can only

be shared with other services through explicit user actions such as copy

and

paste; reauthentication may be required for each independent service.

If an independent service becomes widely used, the development team

may then

integrate that service so that it becomes an integrated and supported

service.

40 Chapter 1 Introduction

K e y P o i n t s

Software engineering is an engineering discipline that is concerned with

all aspects of software production.

Software is not just a program or programs but also includes all

electronic documentation that is needed by system users, quality

assurance staff, and developers. Essential software product attributes are

maintainability, dependability and security, efficiency, and acceptability.

The software process includes all of the activities involved in software

development. The high-level activities of specification, development,

validation, and evolution are part of all software processes.

There are many different types of system, and each requires appropriate

software engineering tools and techniques for their development. Few, if

any, specific design and implementation techniques are applicable to all

kinds of system.

The fundamental ideas of software engineering are applicable to all

types of software system.

These fundamentals include managed software processes, software

dependability and security, requirements engineering, and software reuse.

Software engineers have responsibilities to the engineering profession

and society. They should not simply be concerned with technical issues

but should be aware of the ethical issues that affect their work.

Professional societies publish codes of conduct that embed ethical and

professional standards.

These set out the standards of behavior expected of their members.

F u r t h e r r e a d i n g

“Software Engineering Code of Ethics Is Approved.” An article that

discusses the background to the development of the ACM/IEEE Code of

Ethics and that includes both the short and long form of the code. ( Comm.

ACM, D. Gotterbarn, K. Miller, and S. Rogerson, October 1999). http://

dx.doi.

org/10.1109/MC.1999.796142

A View of 20th and 21st Century Software Engineering.” A backward

and forward look at software engineering from one of the first and most

distinguished software engineers. Barry Boehm identifies timeless software

engineering principles but also suggests that some commonly used

practices are obsolete. (B. Boehm, Proc. 28th Software Engineering Conf.,

Shanghai. 2006). http://dx.doi.

org/10.1145/1134285.1134288

“Software Engineering Ethics.” Special issue of IEEE Computer, with

several papers on the topic ( IEEE Computer, 42 (6), June 2009).

Ethics for the Information Age. This is a wide-ranging book that covers all

aspects of information technology (IT) ethics, not simply ethics for

software engineers. I think this is the right approach as you really need to

understand software engineering ethics within a wider ethical framework

(M. J. Quinn, 2013, Addison-Wesley).

1.1

Chapter 1 Exercises 41

Case studies 41

The Essence of Software Engineering: Applying the SEMAT kernel. This book

discusses the idea of a universal framework that can underlie all software

engineering methods. It can be adapted and used for all types of systems

and organizations. I am personally skeptical about whether or not a

universal approach is realistic in practice, but the book has some

interesting ideas that are worth exploring. (I. Jacobsen, P-W Ng, P. E.

McMahon, I. Spence, and S. Lidman, 2013, Addison-Wesley) W e b S i t e

PowerPoint slides for this chapter:

www.pearsonglobaleditions.com/Sommerville

Links to supporting videos:

http://software-engineering-book.com/videos/software-engineering/

Links to case study descriptions:

http://software-engineering-book.com/case-studies/

e x e r C i S e S

1.1. Explain why professional software that is developed for a customer is

not simply the programs that have been developed and delivered.

1.2. What is the most important difference between generic software

product development and custom software development? What might this

mean in practice for users of generic software products?

1.3. Briefly discuss why it is usually cheaper in the long run to use

software engineering methods and techniques for software systems.

1.4. Software engineering is not only concerned with issues like system

heterogeneity, business and social change, trust, and security, but also

with ethical issues affecting the domain. Give some examples of ethical

issues that have an impact on the software engineering domain.

1.5. Based on your own knowledge of some of the application types

discussed in Section 1.1.2, explain, with examples, why different

application types require specialized software engineering techniques to

support their design and development.

1.6. Explain why the fundamental software engineering principles of

process, dependability, requirements management, and reuse are relevant

to all types of software system.

1.7. Explain how electronic connectivity between various development

teams can support software engineering activities.

1.8. Noncertified individuals are still allowed to practice software

engineering. Discuss some of the possible drawbacks of this.

42 Chapter 1 Introduction

1.9. For each of the clauses in the ACM/IEEE Code of Ethics shown in

Figure 1.4, propose an appropriate example that illustrates that clause.

1.10. The “Drone Revolution” is currently being debated and discussed all

over the world. Drones are unmanned flying machines that are built and

equipped with various kinds of software systems that allow them to see,

hear, and act. Discuss some of the societal challenges of building such

kinds of systems.

r e F e r e n C e S

Bott, F. 2005. Professional Issues in Information Technology. Swindon, UK:

British Computer Society.

Duquenoy, P. 2007. Ethical, Legal and Professional Issues in Computing.

London: Thomson Learning.

Freeman, A. 2011. The Definitive Guide to HTML5. New York: Apress.

Gotterbarn, D., K. Miller, and S. Rogerson. 1999. “Software Engineering

Code of Ethics Is Approved.”

Comm. ACM 42 (10): 102–107. doi:10.1109/MC.1999.796142.

Holdener, A. T. 2008. Ajax: The Definitive Guide. Sebastopol, CA: O’Reilly

and Associates.

Jacobson, I., P-W. Ng, P. E. McMahon, I. Spence, and S. Lidman. 2013. The

Essence of Software Engineering. Boston: Addison-Wesley.

Johnson, D. G. 2001. Computer Ethics. Englewood Cliffs, NJ: Prentice-Hall.

Laudon, K. 1995. “Ethical Concepts and Information Technology.” Comm.

ACM 38 (12): 33–39.

doi:10.1145/219663.219677.

Naur, P., and Randell, B. 1969. Software Engineering: Report on a

conference sponsored by the NATO

Science Committee. Brussels. http://homepages.cs.ncl.ac.uk/brian.randell/

NATO/nato1968.pdf

Tavani, H. T. 2013. Ethics and Technology: Controversies, Questions, and

Strategies for Ethical Computing, 4th ed. New York: John Wiley & Sons.

Vogel, L. 2012. Eclipse 4 Application Development: The Complete Guide to

Eclipse 4 RCP

Development. Sebastopol, CA: O’Reilly & Associates.

2

Software processes

Objectives

The objective of this chapter is to introduce you to the idea of a software

process—a coherent set of activities for software production. When you

have read this chapter, you will:

understand the concepts of software processes and software

process models;

have been introduced to three general software process models and

when they might be used;

know about the fundamental process activities of software requirements

engineering, software development, testing, and evolution;

understand why processes should be organized to cope with changes

in the software requirements and design;

understand the notion of software process improvement and the

factors that affect software process quality.

Contents

2.1 Software process models

2.2 Process activities

2.3 Coping with change

2.4 Process improvement

44 Chapter 2 Software processes

A software process is a set of related activities that leads to the production

of a software system. As I discussed in Chapter 1, there are many different

types of software systems, and there is no universal software engineering

method that is applicable to all of them. Consequently, there is no

universally applicable software process. The

process used in different companies depends on the type of software being

devel-

oped, the requirements of the software customer, and the skills of the

people writing the software.

However, although there are many different software processes, they all

must

include, in some form, the four fundamental software engineering

activities that I

introduced in Chapter 1:

1. Software specification The functionality of the software and constraints

on its operation must be defined.

2. Software development The software to meet the specification must be

produced.

3. Software validation The software must be validated to ensure that it does

what the customer wants.

4. Software evolution The software must evolve to meet changing customer

needs.

These activities are complex activities in themselves, and they include

subactivi-

ties such as requirements validation, architectural design, and unit testing.

Processes also include other activities, such as software configuration

management and project planning that support production activities.

When we describe and discuss processes, we usually talk about the

activities in

these processes, such as specifying a data model and designing a user

interface, and the ordering of these activities. We can all relate to what

people do to develop software. However, when describing processes, it is

also important to describe who is

involved, what is produced, and conditions that influence the sequence of

activities: 1. Products or deliverables are the outcomes of a process

activity. For example, the outcome of the activity of architectural design

may be a model of the software

architecture.

2. Roles reflect the responsibilities of the people involved in the process.

Examples of roles are project manager, configuration manager, and

programmer.

3. Pre- and postconditions are conditions that must hold before and after a

process activity has been enacted or a product produced. For example,

before architectural design begins, a precondition may be that the

consumer has approved all

requirements; after this activity is finished, a postcondition might be that

the

UML models describing the architecture have been reviewed.

Software processes are complex and, like all intellectual and creative

processes,

rely on people making decisions and judgments. As there is no universal

process that is right for all kinds of software, most software companies

have developed their own

2.1 Software process models 45

development processes. Processes have evolved to take advantage of the

capabilities

of the software developers in an organization and the characteristics of the

systems that are being developed. For safety-critical systems, a very

structured development process is required where detailed records are

maintained. For business systems, with rapidly changing requirements, a

more flexible, agile process is likely to be better.

As I discussed in Chapter 1, professional Professional software

development is a

managed activity, so planning is an inherent part of all processes. Plan-

driven pro-

cesses are processes where all of the process activities are planned in

advance and

progress is measured against this plan. In agile processes, which I discuss

in Chapter 3, planning is incremental and continual as the software is

developed. It is therefore easier to change the process to reflect changing

customer or product requirements. As

Boehm and Turner (Boehm and Turner 2004) explain, each approach is

suitable for

different types of software. Generally, for large systems, you need to find a

balance between plan-driven and agile processes.

Although there is no universal software process, there is scope for process

improve-

ment in many organizations. Processes may include outdated techniques

or may not

take advantage of the best practice in industrial software engineering.

Indeed, many organizations still do not take advantage of software

engineering methods in their

software development. They can improve their process by introducing

techniques

such as UML modeling and test-driven development. I discuss software

process

improvement briefly later in thischapter text and in more detail in web

Chapter 26.

2.1 Software process models

As I explained in Chapter 1, a software process model (sometimes called a

Software

Development Life Cycle or SDLC model) is a simplified representation of a

soft-

ware process. Each process model represents a process from a particular

perspective

and thus only provides partial information about that process. For

example, a pro-

cess activity model shows the activities and their sequence but may not

show the

roles of the people involved in these activities. In this section, I introduce

a number of very general process models (sometimes called process

paradigms) and present these from an architectural perspective. That is, we

see the framework of the process but not the details of process activities.

These generic models are high-level, abstract descriptions of software

processes

that can be used to explain different approaches to software development.

You can

think of them as process frameworks that may be extended and adapted to

create

more specific software engineering processes.

The general process models that I cover here are:

1. The waterfall model This takes the fundamental process activities of

specification, development, validation, and evolution and represents them

as separate

process phases such as requirements specification, software design,

implemen-

tation, and testing.

46 Chapter 2 Software processes

The Rational Unified Process

The Rational Unified Process (RUP) brings together elements of all of the

general process models discussed here and supports prototyping and

incremental delivery of software (Krutchen 2003). The RUP is normally

described from three perspectives: a dynamic perspective that shows the

phases of the model in time, a static perspective that shows process

activities, and a practice perspective that suggests good practices to be

used in the process. Phases of the RUP are inception, where a business

case for the system is established; elaboration, where requirements and

architecture are developed; construction where the software is

implemented; and transition, where the system is deployed.

http://software-engineering-book.com/web/rup/

2. Incremental development This approach interleaves the activities of

specification, development, and validation. The system is developed as a

series of versions

(increments), with each version adding functionality to the previous

version.

3. Integration and configuration This approach relies on the availability of

reusable components or systems. The system development process focuses

on

configuring these components for use in a new setting and integrating

them

into a system.

As I have said, there is no universal process model that is right for all

kinds of

software development. The right process depends on the customer and

regulatory

requirements, the environment where the software will be used, and the

type of soft-

ware being developed. For example, safety-critical software is usually

developed

using a waterfall process as lots of analysis and documentation is required

before

implementation begins. Software products are now always developed

using an incre-

mental process model. Business systems are increasingly being developed

by con-

figuring existing systems and integrating these to create a new system

with the

functionality that is required.

The majority of practical software processes are based on a general model

but

often incorporate features of other models. This is particularly true for

large systems engineering. For large systems, it makes sense to combine

some of the best features

of all of the general processes. You need to have information about the

essential

system requirements to design a software architecture to support these

requirements.

You cannot develop this incrementally. Subsystems within a larger system

may be

developed using different approaches. Parts of the system that are well

understood

can be specified and developed using a waterfall-based process or may be

bought in

as off-the-shelf systems for configuration. Other parts of the system, which

are difficult to specify in advance, should always be developed using an

incremental

approach. In both cases, software components are likely to be reused.

Various attempts have been made to develop “universal” process models

that

draw on all of these general models. One of the best known of these

universal models is the Rational Unified Process (RUP) (Krutchen 2003),

which was developed by

Rational, a U.S. software engineering company. The RUP is a flexible

model that

2.1 Software process models 47

Requirements

definition

System and

software design

Implementation

and unit testing

Integration and

system testing

Operation and

Figure 2.1 The

maintenance

waterfall model

can be instantiated in different ways to create processes that resemble any

of the

general process models discussed here. The RUP has been adopted by

some large

software companies (notably IBM), but it has not gained widespread

acceptance.

2.1.1 The waterfall model

The first published model of the software development process was

derived from

engineering process models used in large military systems engineering

(Royce

1970). It presents the software development process as a number of stages,

as shown

in Figure 2.1. Because of the cascade from one phase to another, this

model is known as the waterfall model or software life cycle. The waterfall

model is an example of a plan-driven process. In principle at least, you

plan and schedule all of the process activities before starting software

development.

The stages of the waterfall model directly reflect the fundamental software

devel-

opment activities:

1. Requirements analysis and definition The system’s services, constraints,

and goals are established by consultation with system users. They are then

defined

in detail and serve as a system specification.

2. System and software design The systems design process allocates the

requirements to either hardware or software systems. It establishes an

overall system

architecture. Software design involves identifying and describing the

funda-

mental software system abstractions and their relationships.

3. Implementation and unit testing During this stage, the software design is

realized as a set of programs or program units. Unit testing involves

verifying that

each unit meets its specification.

48 Chapter 2 Software processes

Boehm’s spiral process model

Barry Boehm, one of the pioneers in software engineering, proposed an

incremental process model that was risk-driven. The process is represented

as a spiral rather than a sequence of activities (Boehm 1988).

Each loop in the spiral represents a phase of the software process. Thus,

the innermost loop might be concerned with system feasibility, the next

loop with requirements definition, the next loop with system design, and

so on. The spiral model combines change avoidance with change

tolerance. It assumes that changes are a result of project risks and includes

explicit risk management activities to reduce these risks.

http://software-engineering-book.com/web/spiral-model/

4. Integration and system testing The individual program units or programs

are integrated and tested as a complete system to ensure that the software

requirements have been met. After testing, the software system is

delivered

to the customer.

5. Operation and maintenance Normally, this is the longest life-cycle phase.

The system is installed and put into practical use. Maintenance involves

correcting

errors that were not discovered in earlier stages of the life cycle,

improving the

implementation of system units, and enhancing the system’s services as

new

requirements are discovered.

In principle, the result of each phase in the waterfall model is one or more

docu-

ments that are approved (“signed off”). The following phase should not

start until

the previous phase has finished. For hardware development, where high

manufactur-

ing costs are involved, this makes sense. However, for software

development, these

stages overlap and feed information to each other. During design,

problems with

requirements are identified; during coding design problems are found, and

so on.

The software process, in practice, is never a simple linear model but

involves feed-

back from one phase to another.

As new information emerges in a process stage, the documents produced

at previ-

ous stages should be modified to reflect the required system changes. For

example,

if it is discovered that a requirement is too expensive to implement, the

requirements document should be changed to remove that requirement.

However, this requires

customer approval and delays the overall development process.

As a result, both customers and developers may prematurely freeze the

software

specification so that no further changes are made to it. Unfortunately, this

means that problems are left for later resolution, ignored, or programmed

around. Premature

freezing of requirements may mean that the system won’t do what the

user wants. It

may also lead to badly structured systems as design problems are

circumvented by

implementation tricks.

During the final life-cycle phase (operation and maintenance) the software

is put

into use. Errors and omissions in the original software requirements are

discovered.

2.1 Software process models 49

Program and design errors emerge, and the need for new functionality is

identified.

The system must therefore evolve to remain useful. Making these changes

(software

maintenance) may involve repeating previous process stages.

In reality, software has to be flexible and accommodate change as it is

being

developed. The need for early commitment and system rework when

changes are

made means that the waterfall model is only appropriate for some types of

system:

1. Embedded systems where the software has to interface with hardware

systems.

Because of the inflexibility of hardware, it is not usually possible to delay

deci-

sions on the software’s functionality until it is being implemented.

2. Critical systems where there is a need for extensive safety and security

analysis of the software specification and design. In these systems, the

specification and

design documents must be complete so that this analysis is possible.

Safety-

related problems in the specification and design are usually very expensive

to

correct at the implementation stage.

3. Large software systems that are part of broader engineering systems

developed

by several partner companies. The hardware in the systems may be

developed

using a similar model, and companies find it easier to use a common

model for

hardware and software. Furthermore, where several companies are

involved,

complete specifications may be needed to allow for the independent

develop-

ment of different subsystems.

The waterfall model is not the right process model in situations where

informal

team communication is possible and software requirements change

quickly. Iterative

development and agile methods are better for these systems.

An important variant of the waterfall model is formal system

development, where

a mathematical model of a system specification is created. This model is

then refined, using mathematical transformations that preserve its

consistency, into executable

code. Formal development processes, such as that based on the B method

(Abrial

2005, 2010), are mostly used in the development of software systems that

have strin-

gent safety, reliability, or security requirements. The formal approach

simplifies the production of a safety or security case. This demonstrates to

customers or regulators that the system actually meets its safety or

security requirements. However, because of the high costs of developing a

formal specification, this development model is

rarely used except for critical systems engineering.

2.1.2 Incremental development

Incremental development is based on the idea of developing an initial

implementa-

tion, getting feedback from users and others, and evolving the software

through

several versions until the required system has been developed (Figure 2.2).

Specification, development, and validation activities are interleaved rather

than

separate, with rapid feedback across activities.

50 Chapter 2 Software processes

Concurrent

activities

Initial

Specification

version

Outline

Intermediate

Development

description

versions

Final

Validation

version

Figure 2.2 Incremental

development

Incremental development in some form is now the most common

approach for

the development of application systems and software products. This

approach can

be either plan-driven, agile or, more usually, a mixture of these

approaches. In a

plan-driven approach, the system increments are identified in advance; if

an agile

approach is adopted, the early increments are identified, but the

development of

later increments depends on progress and customer priorities.

Incremental software development, which is a fundamental part of agile

development methods, is better than a waterfall approach for systems

whose

requirements are likely to change during the development process. This is

the

case for most business systems and software products. Incremental

development

reflects the way that we solve problems. We rarely work out a complete

prob-

lem solution in advance but move toward a solution in a series of steps,

back-

tracking when we realize that we have made a mistake. By developing the

software incrementally, it is cheaper and easier to make changes in the

software

as it is being developed.

Each increment or version of the system incorporates some of the

functional-

ity that is needed by the customer. Generally, the early increments of the

system

include the most important or most urgently required functionality. This

means

that the customer or user can evaluate the system at a relatively early

stage in

the development to see if it delivers what is required. If not, then only the

cur-

rent increment has to be changed and, possibly, new functionality defined

for

later increments.

Incremental development has three major advantages over the waterfall

model:

1. The cost of implementing requirements changes is reduced. The amount

of

analysis and documentation that has to be redone is significantly less than

is

required with the waterfall model.

2. It is easier to get customer feedback on the development work that has

been

done. Customers can comment on demonstrations of the software and see

how

2.1 Software process models 51

Problems with incremental development

Although incremental development has many advantages, it is not

problem free. The primary cause of the difficulty is the fact that large

organizations have bureaucratic procedures that have evolved over time

and there may be a mismatch between these procedures and a more

informal iterative or agile process.

Sometimes these procedures are there for good reasons. For example,

there may be procedures to ensure that the software meets properly

implements external regulations (e.g., in the United States, the Sarbanes

Oxley accounting regulations). Changing these procedures may not be

possible, so process conflicts may be unavoidable.

http://software-engineering-book.com/web/incremental-development /

much has been implemented. Customers find it difficult to judge progress

from

software design documents.

3. Early delivery and deployment of useful software to the customer is

possible,

even if all of the functionality has not been included. Customers are able

to use

and gain value from the software earlier than is possible with a waterfall

process.

From a management perspective, the incremental approach has two

problems:

1. The process is not visible. Managers need regular deliverables to

measure pro-

gress. If systems are developed quickly, it is not cost effective to produce

docu-

ments that reflect every version of the system.

2. System structure tends to degrade as new increments are added .

Regular change leads to messy code as new functionality is added in

whatever way is possible.

It becomes increasingly difficult and costly to add new features to a

system. To

reduce structural degradation and general code messiness, agile methods

sug-

gest that you should regularly refactor (improve and restructure) the

software.

The problems of incremental development become particularly acute for

large,

complex, long-lifetime systems, where different teams develop different

parts of the system. Large systems need a stable framework or

architecture, and the responsibilities of the different teams working on

parts of the system need to be clearly

defined with respect to that architecture. This has to be planned in

advance rather

than developed incrementally.

Incremental development does not mean that you have to deliver each

increment

to the system customer. You can develop a system incrementally and

expose it to

customers and other stakeholders for comment, without necessarily

delivering it

and deploying it in the customer’s environment. Incremental delivery

(covered in

Section 2.3.2) means that the software is used in real, operational

processes, so user feedback is likely to be realistic. However, providing

feedback is not always possible as experimenting with new software can

disrupt normal business processes.

52 Chapter 2 Software processes

Application system

Configure

Software

available

application

discovery

system

Requirements

Requirements

specification

refinement

Adapt

components

Software

Integrate

evaluation

system

Components

available

Figure 2.3 Reuse-

Develop new

oriented software

components

engineering

2.1.3 Integration and configuration

In the majority of software projects, there is some software reuse. This

often happens informally when people working on the project know of or

search for code that is

similar to what is required. They look for these, modify them as needed,

and integrate them with the new code that they have developed.

This informal reuse takes place regardless of the development process that

is

used. However, since 2000, software development processes that focus on

the reuse

of existing software have become widely used. Reuse-oriented approaches

rely on a

base of reusable software components and an integrating framework for

the compo-

sition of these components.

Three types of software components are frequently reused:

1. Stand-alone application systems that are configured for use in a

particular environment. These systems are general-purpose systems that

have many features,

but they have to be adapted for use in a specific application.

2. Collections of objects that are developed as a component or as a

package to be

integrated with a component framework such as the Java Spring

framework

(Wheeler and White 2013).

3. Web services that are developed according to service standards and that

are

available for remote invocation over the Internet.

Figure 2.3 shows a general process model for reuse-based development,

based on

integration and configuration. The stages in this process are:

1. Requirements specification The initial requirements for the system are

proposed. These do not have to be elaborated in detail but should include

brief

descriptions of essential requirements and desirable system features.

2. Software discovery and evaluation Given an outline of the software

requirements, a search is made for components and systems that provide

the func-

tionality required. Candidate components and systems are evaluated to see

if

2.1 Software process models 53

Software development tools

Software development tools are programs that are used to support

software engineering process activities.

These tools include requirements management tools, design editors,

refactoring support tools, compilers, debuggers, bug trackers, and system

building tools.

Software tools provide process support by automating some process

activities and by providing information about the software that is being

developed. For example:

The development of graphical system models as part of the

requirements specification or the software design

The generation of code from these graphical models

The generation of user interfaces from a graphical interface description

that is created interactively by the user

Program debugging through the provision of information about an

executing program

The automated translation of programs written using an old version of a

programming language to a more recent version

Tools may be combined within a framework called an Interactive

Development Environment or IDE. This provides a common set of facilities

that tools can use so that it is easier for tools to communicate and operate

in an integrated way.

http://software-engineering-book.com/web/software-tools/

they meet the essential requirements and if they are generally suitable for

use in the system.

3. Requirements refinement During this stage, the requirements are refined

using information about the reusable components and applications that

have been

discovered. The requirements are modified to reflect the available compo-

nents, and the system specification is re-defined. Where modifications are

impossible, the component analysis activity may be reentered to search for

alternative solutions.

4. Application system configuration If an off-the-shelf application system

that meets the requirements is available, it may then be configured for use

to create

the new system.

5. Component adaptation and integration If there is no off-the-shelf system,

individual reusable components may be modified and new components

developed.

These are then integrated to create the system.

Reuse-oriented software engineering, based around configuration and

integra-

tion, has the obvious advantage of reducing the amount of software to be

developed

and so reducing cost and risks. It usually also leads to faster delivery of

the software.

However, requirements compromises are inevitable, and this may lead to a

system

54 Chapter 2 Software processes

that does not meet the real needs of users. Furthermore, some control over

the sys-

tem evolution is lost as new versions of the reusable components are not

under the

control of the organization using them.

Software reuse is very important, and so several chapters in the third I

have dedi-

cated several chapters in the 3rd part of the book to this topic. General

issues of

software reuse are covered in Chapter 15, component-based software

engineering in

Chapters 16 and 17, and service-oriented systems in Chapter 18.

2.2 Process activities

Real software processes are interleaved sequences of technical,

collaborative, and

managerial activities with the overall goal of specifying, designing,

implementing,

and testing a software system. Generally, processes are now tool-

supported. This

means that software developers may use a range of software tools to help

them, such

as requirements management systems, design model editors, program

editors, auto-

mated testing tools, and debuggers.

The four basic process activities of specification, development, validation,

and

evolution are organized differently in different development processes. In

the waterfall model, they are organized in sequence, whereas in

incremental development

they are interleaved. How these activities are carried out depends on the

type of

software being developed, the experience and competence of the

developers, and the

type of organization developing the software.

2.2.1 Software specification

Software specification or requirements engineering is the process of

understanding

and defining what services are required from the system and identifying

the con-

straints on the system’s operation and development. Requirements

engineering is a

particularly critical stage of the software process, as mistakes made at this

stage

inevitably lead to later problems in the system design and implementation.

Before the requirements engineering process starts, a company may carry

out a

feasibility or marketing study to assess whether or not there is a need or a

market for the software and whether or not it is technically and financially

realistic to develop the software required. Feasibility studies are short-

term, relatively cheap studies that inform the decision of whether or not to

go ahead with a more detailed analysis.

The requirements engineering process (Figure 2.4) aims to produce an

agreed

requirements document that specifies a system satisfying stakeholder

requirements.

Requirements are usually presented at two levels of detail. End-users and

customers

need a high-level statement of the requirements; system developers need a

more

detailed system specification.

2.2 Process activities 55

Requirements

elicitation and

analysis

Requirements

specification

Requirements

validation

System

descriptions

User and system

requirements

Figure 2.4 The

Requirements

requirements

document

engineering process

There are three main activities in the requirements engineering process:

1. Requirements elicitation and analysis This is the process of deriving the

system requirements through observation of existing systems, discussions

with potential users and procurers, task analysis, and so on. This may

involve the develop-

ment of one or more system models and prototypes. These help you

understand

the system to be specified.

2. Requirements specification Requirements specification is the activity of

translating the information gathered during requirements analysis into a

document

that defines a set of requirements. Two types of requirements may be

included

in this document. User requirements are abstract statements of the system

requirements for the customer and end-user of the system; system

requirements

are a more detailed description of the functionality to be provided.

3. Requirements validation This activity checks the requirements for

realism, consistency, and completeness. During this process, errors in the

requirements document are inevitably discovered. It must then be

modified to correct

these problems.

Requirements analysis continues during definition and specification, and

new

requirements come to light throughout the process. Therefore, the

activities of analysis, definition, and specification are interleaved.

In agile methods, requirements specification is not a separate activity but

is seen

as part of system development. Requirements are informally specified for

each

increment of the system just before that increment is developed.

Requirements are

specified according to user priorities. The elicitation of requirements

comes from

users who are part of or work closely with the development team.

56 Chapter 2 Software processes

Design inputs

Platform

Software

Data

information

requirements

descriptions

Design activities

Architectural

Interface

design

design

Database

Component

design

selection

and design

Design outputs

System

Database

Interface

Component

Figure 2.5 A general

architecture

design

specification

descriptions

model of the

design process

2.2.2 Software design and implementation

The implementation stage of software development is the process of

developing

an executable system for delivery to the customer. Sometimes this involves

sepa-

rate activities of software design and programming. However, if an agile

approach

to development is used, design and implementation are interleaved, with

no for-

mal design documents produced during the process. Of course, the

software is

still designed, but the design is recorded informally on whiteboards and

program-

mer’s notebooks.

A software design is a description of the structure of the software to be

imple-

mented, the data models and structures used by the system, the interfaces

between

system components and, sometimes, the algorithms used. Designers do not

arrive at

a finished design immediately but develop the design in stages. They add

detail as

they develop their design, with constant backtracking to modify earlier

designs.

Figure 2.5 is an abstract model of the design process showing the inputs to

the

design process, process activities, and the process outputs. The design

process activities are both interleaved and interdependent. New

information about the design is

constantly being generated, and this affects previous design decisions.

Design

rework is therefore inevitable.

2.2 Process activities 57

Most software interfaces with other software systems. These other systems

include the operating system, database, middleware, and other application

systems.

These make up the “software platform,’ the environment in which the

software will

execute. Information about this platform is an essential input to the design

process, as designers must decide how best to integrate it with its

environment. If the system is to process existing data, then the description

of that data may be included in the platform specification. Otherwise, the

data description must be an input to the design process so that the system

data organization can be defined.

The activities in the design process vary, depending on the type of system

being

developed. For example, real-time systems require an additional stage of

timing design but may not include a database, so there is no database

design involved. Figure 2.5

shows four activities that may be part of the design process for

information systems: 1. Architectural design, where you identify the overall

structure of the system, the principal components (sometimes called

subsystems or modules), their relationships, and how they are distributed.

2. Database design, where you design the system data structures and how

these are to be represented in a database. Again, the work here depends

on whether an

existing database is to be reused or a new database is to be created.

3. Interface design, where you define the interfaces between system

components.

This interface specification must be unambiguous. With a precise

interface, a

component may be used by other components without them having to

know

how it is implemented. Once interface specifications are agreed, the

compo-

nents can be separately designed and developed.

4. Component selection and design, where you search for reusable

components and, if no suitable components are available, design new

software components.

The design at this stage may be a simple component description with the

imple-

mentation details left to the programmer. Alternatively, it may be a list of

changes to be made to a reusable component or a detailed design model

expressed in the UML. The design model may then be used to

automatically

generate an implementation.

These activities lead to the design outputs, which are also shown in Figure

2.5.

For critical systems, the outputs of the design process are detailed design

documents setting out precise and accurate descriptions of the system. If a

model-driven

approach is used (Chapter 5), the design outputs are design diagrams.

Where agile

methods of development are used, the outputs of the design process may

not be

separate specification documents but may be represented in the code of

the program.

The development of a program to implement a system follows naturally

from

system design. Although some classes of program, such as safety-critical

systems,

are usually designed in detail before any implementation begins, it is more

common

for design and program development to be interleaved. Software

development tools

may be used to generate a skeleton program from a design. This includes

code to

58 Chapter 2 Software processes

Component

System testing

Customer

testing

testing

Figure 2.6 Stages

of testing

define and implement interfaces, and, in many cases, the developer need

only add

details of the operation of each program component.

Programming is an individual activity, and there is no general process that

is

usually followed. Some programmers start with components that they

understand,

develop these, and then move on to less understood components. Others

take the

opposite approach, leaving familiar components till last because they

know how to

develop them. Some developers like to define data early in the process and

then

use this to drive the program development; others leave data unspecified

for as

long as possible.

Normally, programmers carry out some testing of the code they have

developed.

This often reveals program defects (bugs) that must be removed from the

program.

Finding and fixing program defects is called debugging. Defect testing and

debugging are different processes. Testing establishes the existence of

defects. Debugging is concerned with locating and correcting these

defects.

When you are debugging, you have to generate hypotheses about the

observa-

ble behavior of the program and then test these hypotheses in the hope of

finding

the fault that caused the output anomaly. Testing the hypotheses may

involve trac-

ing the program code manually. It may require new test cases to localize

the prob-

lem. Interactive debugging tools, which show the intermediate values of

program

variables and a trace of the statements executed, are usually used to

support the

debugging process.

2.2.3 Software validation

Software validation or, more generally, verification and validation (V & V)

is

intended to show that a system both conforms to its specification and

meets the

expectations of the system customer. Program testing, where the system is

executed

using simulated test data, is the principal validation technique. Validation

may also involve checking processes, such as inspections and reviews, at

each stage of the

software process from user requirements definition to program

development.

However, most V & V time and effort is spent on program testing.

Except for small programs, systems should not be tested as a single,

monolithic

unit. Figure 2.6 shows a three-stage testing process in which system

components are

individually tested, then the integrated system is tested. For custom

software, cus-

tomer testing involves testing the system with real customer data. For

products that are sold as applications, customer testing is sometimes called

beta testing where

selected users try out and comment on the software.

2.2 Process activities 59

The stages in the testing process are:

1. Component testing The components making up the system are tested by

the people developing the system. Each component is tested

independently, without other

system components. Components may be simple entities such as functions

or

object classes or may be coherent groupings of these entities. Test

automation

tools, such as JUnit for Java, that can rerun tests when new versions of the

component are created, are commonly used (Koskela 2013).

2. System testing System components are integrated to create a complete

system.

This process is concerned with finding errors that result from

unanticipated

interactions between components and component interface problems. It is

also

concerned with showing that the system meets its functional and non-

functional

requirements, and testing the emergent system properties. For large

systems,

this may be a multistage process where components are integrated to form

subsystems that are individually tested before these subsystems are

integrated to

form the final system.

3. Customer testing This is the final stage in the testing process before the

system is accepted for operational use. The system is tested by the system

customer (or

potential customer) rather than with simulated test data. For custom-built

software, customer testing may reveal errors and omissions in the system

requirements definition, because the real data exercise the system in

different

ways from the test data. Customer testing may also reveal requirements

problems

where the system’s facilities do not really meet the users’ needs or the

system

performance is unacceptable. For products, customer testing shows how

well

the software product meets the customer’s needs.

Ideally, component defects are discovered early in the testing process, and

inter-

face problems are found when the system is integrated. However, as

defects are dis-

covered, the program must be debugged, and this may require other stages

in the

testing process to be repeated. Errors in program components, say, may

come to

light during system testing. The process is therefore an iterative one with

informa-

tion being fed back from later stages to earlier parts of the process.

Normally, component testing is simply part of the normal development

process.

Programmers make up their own test data and incrementally test the code

as it is

developed. The programmer knows the component and is therefore the

best person

to generate test cases.

If an incremental approach to development is used, each increment should

be

tested as it is developed, with these tests based on the requirements for

that increment. In test-driven development, which is a normal part of agile

processes, tests are developed along with the requirements before

development starts. This helps the

testers and developers to understand the requirements and ensures that

there are no

delays as test cases are created.

When a plan-driven software process is used (e.g., for critical systems

develop-

ment), testing is driven by a set of test plans. An independent team of

testers works

60 Chapter 2 Software processes

Requirements

System

System

Component

specification

specification

design

design

Customer

System

Sub-system

Component

test plan

integration

integration

code and test

test plan

test plan

Customer

System

Sub-system

Service

test

integration test

integration test

Figure 2.7 Testing

phases in a plan-driven

software process

from these test plans, which have been developed from the system

specification and

design. Figure 2.7 illustrates how test plans are the link between testing

and development activities. This is sometimes called the V-model of

development (turn it on its side to see the V). The V-model shows the

software validation activities that correspond to each stage of the waterfall

process model.

When a system is to be marketed as a software product, a testing process

called

beta testing is often used. Beta testing involves delivering a system to a

number of potential customers who agree to use that system. They report

problems to the system developers. This exposes the product to real use

and detects errors that may not have been anticipated by the product

developers. After this feedback, the software

product may be modified and released for further beta testing or general

sale.

2.2.4 Software evolution

The flexibility of software is one of the main reasons why more and more

software

is being incorporated into large, complex systems. Once a decision has

been made to

manufacture hardware, it is very expensive to make changes to the

hardware design.

However, changes can be made to software at any time during or after the

system

development. Even extensive changes are still much cheaper than

corresponding

changes to system hardware.

Historically, there has always been a split between the process of software

development and the process of software evolution (software

maintenance). People

think of software development as a creative activity in which a software

system is

developed from an initial concept through to a working system. However,

they

sometimes think of software maintenance as dull and uninteresting. They

think

that software maintenance is less interesting and challenging than original

soft-

ware development.

This distinction between development and maintenance is increasingly

irrelevant.

Very few software systems are completely new systems, and it makes

much more

2.3 Coping with change 61

Define system

Assess existing

Propose system

Modify

requirements

systems

changes

systems

Existing

New

Figure 2.8 Software

systems

system

system evolution

sense to see development and maintenance as a continuum. Rather than

two separate

processes, it is more realistic to think of software engineering as an

evolutionary

process (Figure 2.8) where software is continually changed over its

lifetime in

response to changing requirements and customer needs.

2.3 Coping with change

Change is inevitable in all large software projects. The system

requirements

change as businesses respond to external pressures, competition, and

changed

management priorities. As new technologies become available, new

approaches to

design and implementation become possible. Therefore whatever software

pro-

cess model is used, it is essential that it can accommodate changes to the

software

being developed.

Change adds to the costs of software development because it usually

means

that work that has been completed has to be redone. This is called rework.

For

example, if the relationships between the requirements in a system have

been ana-

lyzed and new requirements are then identified, some or all of the

requirements

analysis has to be repeated. It may then be necessary to redesign the

system to

deliver the new requirements, change any programs that have been

developed,

and retest the system.

Two related approaches may be used to reduce the costs of rework:

1. Change anticipation, where the software process includes activities that

can anticipate or predict possible changes before significant rework is

required. For

example, a prototype system may be developed to show some key features

of

the system to customers. They can experiment with the prototype and

refine

their requirements before committing to high software production costs.

2. Change tolerance, where the process and software are designed so that

changes can be easily made to the system. This normally involves some

form of incremental development. Proposed changes may be implemented

in increments that

have not yet been developed. If this is impossible, then only a single

increment

(a small part of the system) may have to be altered to incorporate the

change.

62 Chapter 2 Software processes

In this section, I discuss two ways of coping with change and changing

system

requirements:

1. System prototyping, where a version of the system or part of the system

is developed quickly to check the customer’s requirements and the

feasibility of

design decisions. This is a method of change anticipation as it allows users

to

experiment with the system before delivery and so refine their

requirements.

The number of requirements change proposals made after delivery is

therefore

likely to be reduced.

2. Incremental delivery, where system increments are delivered to the

customer for comment and experimentation. This supports both change

avoidance and

change tolerance. It avoids the premature commitment to requirements for

the

whole system and allows changes to be incorporated into later increments

at

relatively low cost.

The notion of refactoring, namely, improving the structure and

organization of a

program, is also an important mechanism that supports change tolerance. I

discuss

this in Chapter 3 (Agile methods).

2.3.1 Prototyping

A prototype is an early version of a software system that is used to

demonstrate concepts, try out design options, and find out more about the

problem and its possible

solutions. Rapid, iterative development of the prototype is essential so that

costs are controlled and system stakeholders can experiment with the

prototype early in the

software process.

A software prototype can be used in a software development process to

help

anticipate changes that may be required:

1. In the requirements engineering process, a prototype can help with the

elicita-

tion and validation of system requirements.

2. In the system design process, a prototype can be used to explore

software solu-

tions and in the development of a user interface for the system.

System prototypes allow potential users to see how well the system

supports their

work. They may get new ideas for requirements and find areas of strength

and weak-

ness in the software. They may then propose new system requirements.

Furthermore,

as the prototype is developed, it may reveal errors and omissions in the

system

requirements. A feature described in a specification may seem to be clear

and useful.

However, when that function is combined with other functions, users

often find that

their initial view was incorrect or incomplete. The system specification

can then be modified to reflect the changed understanding of the

requirements.

2.3 Coping with change 63

Establish

Define

prototype

prototype

Develop

Evaluate

objectives

functionality

prototype

prototype

Prototyping

Outline

Executable

Evaluation

Figure 2.9 Prototype

plan

definition

prototype

report

development

A system prototype may be used while the system is being designed to

carry out

design experiments to check the feasibility of a proposed design. For

example, a

database design may be prototyped and tested to check that it supports

efficient data access for the most common user queries. Rapid prototyping

with end-user involvement is the only sensible way to develop user

interfaces. Because of the dynamic

nature of user interfaces, textual descriptions and diagrams are not good

enough for expressing the user interface requirements and design.

A process model for prototype development is shown in Figure 2.9. The

objec-

tives of prototyping should be made explicit from the start of the process.

These

may be to develop the user interface, to develop a system to validate

functional

system requirements, or to develop a system to demonstrate the

application to man-

agers. The same prototype usually cannot meet all objectives. If the

objectives are

left unstated, management or end-users may misunderstand the function

of the pro-

totype. Consequently, they may not get the benefits that they expected

from the

prototype development.

The next stage in the process is to decide what to put into and, perhaps

more

importantly, what to leave out of the prototype system. To reduce

prototyping costs

and accelerate the delivery schedule, you may leave some functionality

out of the

prototype. You may decide to relax non-functional requirements such as

response

time and memory utilization. Error handling and management may be

ignored unless

the objective of the prototype is to establish a user interface. Standards of

reliability and program quality may be reduced.

The final stage of the process is prototype evaluation. Provision must be

made during this stage for user training, and the prototype objectives

should

be used to derive a plan for evaluation. Potential users need time to

become

comfortable with a new system and to settle into a normal pattern of

usage. Once

they are using the system normally, they then discover requirements

errors

and omissions. A general problem with prototyping is that users may not

use the

prototype in the same way as they use the final system. Prototype testers

may

not be typical of system users. There may not be enough time to train

users

during prototype evaluation. If the prototype is slow, the evaluators may

adjust

their way of working and avoid those system features that have slow

response

times. When provided with better response in the final system, they may

use it in

a different way.

64 Chapter 2 Software processes

Define outline

Assign requirements

Design system

Develop system

requirements

to increments

architecture

increment

System

incomplete?

Validate

Integrate

Validate

Deploy

increment

increment

system

increment

System

complete?

Final

Figure 2.10

system

Incremental delivery

2.3.2 Incremental delivery

Incremental delivery (Figure 2.10) is an approach to software

development where

some of the developed increments are delivered to the customer and

deployed for

use in their working environment. In an incremental delivery process,

customers

define which of the services are most important and which are least

important to

them. A number of delivery increments are then defined, with each

increment pro-

viding a subset of the system functionality. The allocation of services to

increments depends on the service priority, with the highest priority

services implemented and

delivered first.

Once the system increments have been identified, the requirements for the

services to be delivered in the first increment are defined in detail and

that increment is developed. During development, further requirements

analysis for later

increments can take place, but requirements changes for the current

increment

are not accepted.

Once an increment is completed and delivered, it is installed in the

customer’s

normal working environment. They can experiment with the system, and

this helps

them clarify their requirements for later system increments. As new

increments are

completed, they are integrated with existing increments so that system

functionality improves with each delivered increment.

Incremental delivery has a number of advantages:

1. Customers can use the early increments as prototypes and gain

experience that

informs their requirements for later system increments. Unlike prototypes,

these are part of the real system, so there is no relearning when the

complete

system is available.

2. Customers do not have to wait until the entire system is delivered

before they

can gain value from it. The first increment satisfies their most critical

require-

ments, so they can use the software immediately.

3. The process maintains the benefits of incremental development in that

it should

be relatively easy to incorporate changes into the system.

2.4 Process improvement 65

4. As the highest priority services are delivered first and later increments

then integrated, the most important system services receive the most

testing. This means

that customers are less likely to encounter software failures in the most

impor-

tant parts of the system.

However, there are problems with incremental delivery. In practice, it

only works in situations where a brand-new system is being introduced

and the system evaluators are given time to experiment with the new

system. Key problems with this approach are:

1. Iterative delivery is problematic when the new system is intended to

replace an

existing system. Users need all of the functionality of the old system and

are

usually unwilling to experiment with an incomplete new system. It is often

impractical to use the old and the new systems alongside each other as

they are

likely to have different databases and user interfaces.

2. Most systems require a set of basic facilities that are used by different

parts of the system. As requirements are not defined in detail until an

increment is to be implemented, it can be hard to identify common

facilities that are needed by all increments.

3. The essence of iterative processes is that the specification is developed

in conjunction with the software. However, this conflicts with the

procurement model

of many organizations, where the complete system specification is part of

the

system development contract. In the incremental approach, there is no

complete

system specification until the final increment is specified. This requires a

new

form of contract, which large customers such as government agencies may

find

difficult to accommodate.

For some types of systems, incremental development and delivery is not

the best

approach. These are very large systems where development may involve

teams working

in different locations, some embedded systems where the software

depends on hardware development, and some critical systems where all

the requirements must be analyzed to check for interactions that may

compromise the safety or security of the system.

These large systems, of course, suffer from the same problems of uncertain

and

changing requirements. Therefore, to address these problems and get some

of the

benefits of incremental development, a system prototype may be

developed and used

as a platform for experiments with the system requirements and design.

With the

experience gained from the prototype, definitive requirements can then be

agreed.

2.4 Process improvement

Nowadays, there is a constant demand from industry for cheaper, better

software,

which has to be delivered to ever-tighter deadlines. Consequently, many

software

companies have turned to software process improvement as a way of

enhancing the

66 Chapter 2 Software processes

Measure

Change

Analyze

Figure 2.11 The process

improvement cycle

quality of their software, reducing costs, or accelerating their development

pro-

cesses. Process improvement means understanding existing processes and

changing

these processes to increase product quality and/or reduce costs and

development

time. I cover general issues of process measurement and process

improvement in

detail in web Chapter 26.

Two quite different approaches to process improvement and change are

used:

1. The process maturity approach, which has focused on improving

process and

project management and introducing good software engineering practice

into an

organization. The level of process maturity reflects the extent to which

good

technical and management practice has been adopted in organizational

software

development processes. The primary goals of this approach are improved

prod-

uct quality and process predictability.

2. The agile approach, which has focused on iterative development and

the reduc-

tion of overheads in the software process. The primary characteristics of

agile

methods are rapid delivery of functionality and responsiveness to

changing cus-

tomer requirements. The improvement philosophy here is that the best

processes

are those with the lowest overheads and agile approaches can achieve this.

I describe agile approaches in Chapter 3.

People who are enthusiastic about and committed to each of these

approaches are

generally skeptical of the benefits of the other. The process maturity

approach is

rooted in plan-driven development and usually requires increased

“overhead,” in the

sense that activities are introduced that are not directly relevant to

program development. Agile approaches focus on the code being

developed and deliberately mini-

mize formality and documentation.

The general process improvement process underlying the process maturity

approach is a cyclical process, as shown in Figure 2.11. The stages in this

process are: 1. Process measurement You measure one or more attributes of

the software process or product. These measurements form a baseline that

helps you decide if

2.4 Process improvement 67

process improvements have been effective. As you introduce

improvements, you

re-measure the same attributes, which will hopefully have improved in

some way.

2. Process analysis The current process is assessed, and process weaknesses

and bottlenecks are identified. Process models (sometimes called process

maps) that

describe the process may be developed during this stage. The analysis may

be

focused by considering process characteristics such as rapidity and

robustness.

3. Process change Process changes are proposed to address some of the

identified process weaknesses. These are introduced, and the cycle

resumes to collect data

about the effectiveness of the changes.

Without concrete data on a process or the software developed using that

process, it

is impossible to assess the value of process improvement. However,

companies starting the process improvement process are unlikely to have

process data available as an

improvement baseline. Therefore, as part of the first cycle of changes, you

may have to collect data about the software process and to measure

software product characteristics.

Process improvement is a long-term activity, so each of the stages in the

improve-

ment process may last several months. It is also a continuous activity as,

whatever

new processes are introduced, the business environment will change and

the new

processes will themselves have to evolve to take these changes into

account.

The notion of process maturity was introduced in the late 1980s when the

Software Engineering Institute (SEI) proposed their model of process

capability

maturity (Humphrey 1988). The maturity of a software company’s

processes reflects

the process management, measurement, and use of good software

engineering prac-

tices in the company. This idea was introduced so that the U.S.

Department of

Defense could assess the software engineering capability of defense

contractors,

with a view to limiting contracts to those contractors who had reached a

required

level of process maturity. Five levels of process maturity were proposed. as

shown in Figure 2.12. These have evolved and developed over the last 25

years (Chrissis,

Konrad, and Shrum 2011), but the fundamental ideas in Humphrey’s

model are still

the basis of software process maturity assessment.

The levels in the process maturity model are:

1. Initial The goals associated with the process area are satisfied, and for

all processes the scope of the work to be performed is explicitly set out

and communi-

cated to the team members.

2. Managed At this level, the goals associated with the process area are

met, and organizational policies are in place that define when each

process should be used. There

must be documented project plans that define the project goals. Resource

manage-

ment and process monitoring procedures must be in place across the

institution.

3. Defined This level focuses on organizational standardization and

deployment of processes. Each project has a managed process that is

adapted to the project requirements from a defined set of organizational

processes. Process assets and process

measurements must be collected and used for future process

improvements.

68 Chapter 2 Software processes

Level 5

Optimizing

Level 4

Quantitatively

managed

Level 3

Defined

Level 2

Managed

Level 1

Figure 2.12 Capability

Initial

maturity levels

4. Quantitatively managed At this level, there is an organizational

responsibility to use statistical and other quantitative methods to control

subprocesses. That is, collected process and product measurements must

be used in process management.

5. Optimizing At this highest level, the organization must use the process

and product measurements to drive process improvement. Trends must be

analyzed

and the processes adapted to changing business needs.

The work on process maturity levels has had a major impact on the

software

industry. It focused attention on the software engineering processes and

practices

that were used and led to significant improvements in software

engineering capabil-

ity. However, there is too much overhead in formal process improvement

for small

companies, and maturity estimation with agile processes is difficult.

Consequently,

only large software companies now use this maturity-focused approach to

software

process improvement.

K e y p o i n t s

Software processes are the activities involved in producing a software

system. Software process models are abstract representations of these

processes.

General process models describe the organization of software processes.

Examples of these general models include the waterfall model,

incremental development, and reusable component configuration and

integration.

Chapter 2 Website 69

Requirements engineering is the process of developing a software

specification. Specifications are intended to communicate the system

needs of the customer to the system developers.

Design and implementation processes are concerned with transforming

a requirements specification into an executable software system.

Software validation is the process of checking that the system conforms

to its specification and that it meets the real needs of the users of the

system.

Software evolution takes place when you change existing software

systems to meet new requirements. Changes are continuous, and the

software must evolve to remain useful.

Processes should include activities to cope with change. This may

involve a prototyping phase that helps avoid poor decisions on

requirements and design. Processes may be structured for iterative

development and delivery so that changes may be made without

disrupting the system as a whole.

Process improvement is the process of improving existing software

processes to improve software quality, lower development costs, or reduce

development time. It is a cyclic process involving process measurement,

analysis, and change.

F u r t h e r r e a d i n g

“Process Models in Software Engineering.” This is an excellent overview of

a wide range of software engineering process models that have been

proposed. (W. Scacchi, Encyclopaedia of Software Engineering, ed. J. J.

Marciniak, John Wiley & Sons, 2001) http://www.ics.uci.edu/~wscacchi/

Papers/SE-Encyc/Process-Models-SE-Encyc.pdf

Software Process Improvement: Results and Experience from the Field. This

book is a collection of papers focusing on process improvement case

studies in several small and medium-sized Norwegian companies. It also

includes a good introduction to the general issues of process improvement.

(Conradi, R., Dybå, T., Sjøberg, D., and Ulsund, T. (eds.), Springer, 2006).

“Software Development Life Cycle Models and Methodologies.” This blog

post is a succinct summary of several software process models that have

been proposed and used. It discusses the advan-

tages and disadvantages of each of these models (M. Sami, 2012). http://

melsatar.wordpress.

com/2012/03/15/software-development-life-cycle-models-and-

methodologies/

W e b S i t e

PowerPoint slides for this chapter:

www.pearsonglobaleditions.com/Sommerville

Links to supporting videos:

http://software-engineering-book.com/videos/software-engineering/

70 Chapter

Chapter 2 Software processes

e x e r C i S e S

2.1. Suggest the most appropriate generic software process model that

might be used as a basis for managing the development of the following

systems. Explain your answer according to the type of system being

developed:

A system to control antilock braking in a car

A virtual reality system to support software maintenance

A university accounting system that replaces an existing system

An interactive travel planning system that helps users plan journeys with

the lowest environmental impact

2.2. Incremental software development could be very effectively used for

customers who do not have a clear idea about the systems needed for their

operations. Discuss.

2.3. Consider the integration and configuration process model shown in

Figure 2.3. Explain why it is essential to repeat the requirements

engineering activity in the process.

2.4. Suggest why it is important to make a distinction between developing

the user requirements and developing system requirements in the

requirements engineering process.

2.5. Using an example, explain why the design activities of architectural

design, database design, interface design, and component design are

interdependent.

2.6. Explain why software testing should always be an incremental, staged

activity. Are programmers the best people to test the programs that they

have developed?

2.7. Imagine that a government wants a software program that helps to

keep track of the utilization of the country’s vast mineral resources.

Although the requirements put forward by the government were not very

clear, a software company was tasked with the development of a

prototype. The government found the prototype impressive, and asked it

be extended to be the actual system that would be used. Discuss the pros

and cons of taking this approach.

2.8. You have developed a prototype of a software system and your

manager is very impressed by it. She proposes that it should be put into

use as a production system, with new features added as required. This

avoids the expense of system development and makes the system

immediately useful. Write a short report for your manager explaining why

prototype systems should not normally be used as production systems.

2.9. Suggest two advantages and two disadvantages of the approach to

process assessment and improvement that is embodied in the SEI’s

Capability Maturity framework.

2.10. Historically, the introduction of technology has caused profound

changes in the labor market and, temporarily at least, displaced people

from jobs. Discuss whether the introduction of extensive process

automation is likely to have the same consequences for software

engineers. If you don’t think it will, explain why not. If you think that it

will reduce job opportunities, is it ethical for the engineers affected to

passively or actively resist the introduction of this technology?

Chapter 2 References 71

r e F e r e n C e S

Abrial, J. R. 2005. The B Book: Assigning Programs to Meanings. Cambridge,

UK: Cambridge University Press.

. 2010. Modeling in Event-B: System and Software Engineering. Cambridge,

UK: Cambridge University Press.

Boehm, B. W. (1988). “A Spiral Model of Software Development and

Enhancement.” IEEE Computer, 21 (5), 61–72. doi:10.1145/12944.12948

Boehm, B. W., and R. Turner. 2004. “Balancing Agility and Discipline:

Evaluating and Integrating Agile and Plan-Driven Methods.” In 26th Int.

Conf on Software Engineering, Edinburgh, Scotland.

doi:10.1109/ICSE.2004.1317503.

Chrissis, M. B., M. Konrad, and S. Shrum. 2011. CMMI for Development:

Guidelines for Process Integration and Product Improvement, 3rd ed. Boston:

Addison-Wesley.

Humphrey, W. S. 1988. “Characterizing the Software Process: A Maturity

Framework.” IEEE Software 5 (2): 73–79. doi:10.1109/2.59.

Koskela, L. 2013. Effective Unit Testing: A Guide for Java Developers.

Greenwich, CT: Manning Publications.

Krutchen, P. 2003. The Rational Unified Process—An Introduction, 3rd ed.

Reading, MA: Addison-Wesley.

Royce, W. W. 1970. “Managing the Development of Large Software

Systems: Concepts and Techniques.” In IEEE WESTCON, 1–9. Los Angeles,

CA.

Wheeler, W., and J. White. 2013. Spring in Practice. Greenwich, CT:

Manning Publications.

3

Agile software

development

Objectives

The objective of this chapter is to introduce you to agile software

development methods. When you have read the chapter, you will:

understand the rationale for agile software development methods,

the agile manifesto, and the differences between agile and

plan-driven development;

know about important agile development practices such as user

stories, refactoring, pair programming and test-first development;

understand the Scrum approach to agile project management;

understand the issues of scaling agile development methods and

combining agile approaches with plan-driven approaches in the

development of large software systems.

Contents

3.1 Agile methods

3.2 Agile development techniques

3.3 Agile project management

3.4 Scaling agile methods

Chapter 3 Agile software development 73

Businesses now operate in a global, rapidly changing environment. They

have to

respond to new opportunities and markets, changing economic conditions

and the

emergence of competing products and services. Software is part of almost

all busi-

ness operations, so new software has to be developed quickly to take

advantage of

new opportunities and to respond to competitive pressure. Rapid software

develop-

ment and delivery is therefore the most critical requirement for most

business systems.

In fact, businesses may be willing to trade off software quality and

compromise on

requirements if they can deploy essential new software quickly.

Because these businesses are operating in a changing environment, it is

practi-

cally impossible to derive a complete set of stable software requirements.

Requirements change because customers find it impossible to predict how

a system

will affect working practices, how it will interact with other systems, and

what user operations should be automated. It may only be after a system

has been delivered

and users gain experience with it that the real requirements become clear.

Even then, external factors drive requirements change.

Plan-driven software development processes that completely specify the

require-

ments and then design, build, and test a system are not geared to rapid

software development. As the requirements change or as requirements

problems are discovered, the

system design or implementation has to be reworked and retested. As a

consequence,

a conventional waterfall or specification-based process is usually a lengthy

one, and the final software is delivered to the customer long after it was

originally specified.

For some types of software, such as safety-critical control systems, where a

com-

plete analysis of the system is essential, this plan-driven approach is the

right one.

However, in a fast-moving business environment, it can cause real

problems. By the

time the software is available for use, the original reason for its

procurement may

have changed so radically that the software is effectively useless.

Therefore, for

business systems in particular, development processes that focus on rapid

software

development and delivery are essential.

The need for rapid software development and processes that can handle

changing

requirements has been recognized for many years (Larman and Basili

2003).

However, faster software development really took off in the late 1990s

with the

development of the idea of “agile methods” such as Extreme Programming

(Beck

1999), Scrum (Schwaber and Beedle 2001), and DSDM (Stapleton 2003).

Rapid software development became known as agile development or agile

meth-

ods. These agile methods are designed to produce useful software quickly.

All of the agile methods that have been proposed share a number of

common characteristics:

1. The processes of specification, design and implementation are

interleaved.

There is no detailed system specification, and design documentation is

mini-

mized or generated automatically by the programming environment used

to

implement the system. The user requirements document is an outline

definition

of the most important characteristics of the system.

2. The system is developed in a series of increments. End-users and other

system

stakeholders are involved in specifying and evaluating each increment.

74 Chapter 3 Agile software development

Plan-based development

Requirements

Requirements

Design and

engineering

specification

implementation

Requirements change

requests

Agile development

Requirements

Design and

engineering

implementation

Figure 3.1 Plan-driven

and agile development

They may propose changes to the software and new requirements that

should be

implemented in a later version of the system.

3. Extensive tool support is used to support the development process.

Tools that

may be used include automated testing tools, tools to support

configuration man-

agement, and system integration and tools to automate user interface

production.

Agile methods are incremental development methods in which the

increments are

small, and, typically, new releases of the system are created and made

available to

customers every two or three weeks. They involve customers in the

development

process to get rapid feedback on changing requirements. They minimize

documentation

by using informal communications rather than formal meetings with

written documents.

Agile approaches to software development consider design and

implementation

to be the central activities in the software process. They incorporate other

activities, such as requirements elicitation and testing, into design and

implementation. By

contrast, a plan-driven approach to software engineering identifies

separate stages in the software process with outputs associated with each

stage. The outputs from one

stage are used as a basis for planning the following process activity.

Figure 3.1 shows the essential distinctions between plan-driven and agile

approaches to system specification. In a plan-driven software development

process, iteration

occurs within activities, with formal documents used to communicate

between stages

of the process. For example, the requirements will evolve, and, ultimately,

a requirements specification will be produced. This is then an input to the

design and imple-

mentation process. In an agile approach, iteration occurs across activities.

Therefore, the requirements and the design are developed together rather

than separately.

In practice, as I explain in Section 3.4.1, plan-driven processes are often

used along with agile programming practices, and agile methods may

incorporate some planned

3.1 Agile methods 75

activities apart from programming and testing. It is perfectly feasible, in a

plan-driven process, to allocate requirements and plan the design and

development phase as a

series of increments. An agile process is not inevitably code-focused, and it

may

produce some design documentation. Agile developers may decide that an

iteration

should not produce new code but rather should produce system models

and documentation.

3.1 Agile methods

In the 1980s and early 1990s, there was a widespread view that the best

way to

achieve better software was through careful project planning, formalized

quality

assurance, use of analysis and design methods supported by software

tools, and con-

trolled and rigorous software development processes. This view came from

the soft-

ware engineering community that was responsible for developing large,

long-lived

software systems such as aerospace and government systems.

This plan-driven approach was developed for software developed by large

teams,

working for different companies. Teams were often geographically

dispersed and

worked on the software for long periods of time. An example of this type

of software is the control systems for a modern aircraft, which might take

up to 10 years from

initial specification to deployment. Plan-driven approaches involve a

significant

overhead in planning, designing, and documenting the system. This

overhead is jus-

tified when the work of multiple development teams has to be

coordinated, when the

system is a critical system, and when many different people will be

involved in

maintaining the software over its lifetime.

However, when this heavyweight, plan-driven development approach is

applied

to small and medium-sized business systems, the overhead involved is so

large that

it dominates the software development process. More time is spent on how

the sys-

tem should be developed than on program development and testing. As

the system

requirements change, rework is essential and, in principle at least, the

specification and design have to change with the program.

Dissatisfaction with these heavyweight approaches to software

engineering

led to the development of agile methods in the late 1990s. These methods

allowed

the development team to focus on the software itself rather than on its

design and

documentation. They are best suited to application development where

the sys-

tem requirements usually change rapidly during the development process.

They

are intended to deliver working software quickly to customers, who can

then pro-

pose new and changed requirements to be included in later iterations of

the sys-

tem. They aim to cut down on process bureaucracy by avoiding work that

has

dubious long-term value and eliminating documentation that will

probably never

be used.

The philosophy behind agile methods is reflected in the agile manifesto

(http://

agilemanifesto.org) issued by the leading developers of these methods.

This mani-

festo states:

76 Chapter 3 Agile software development

Principle

Description

Customer involvement

Customers should be closely involved throughout the development

process.

Their role is provide and prioritize new system requirements and to

evaluate

the iterations of the system.

Embrace change

Expect the system requirements to change, and so design the system to

accommodate these changes.

Incremental delivery

The software is developed in increments, with the customer specifying the

requirements to be included in each increment.

Maintain simplicity

Focus on simplicity in both the software being developed and in the

development process. Wherever possible, actively work to eliminate

complexity from the system.

People, not process

The skills of the development team should be recognized and exploited.

Team members should be left to develop their own ways of working

without

prescriptive processes.

Figure 3.2 The

We are uncovering better ways of developing software by doing it and helping

principles of agile

others do it. Through this work we have come to value:

methods

Individuals and interactions over processes and tools

Working software over comprehensive documentation

Customer collaboration over contract negotiation

Responding to change over following a plan

That is, while there is value in the items on the right, we value the items on the

left more .

All agile methods suggest that software should be developed and delivered

incre-

mentally. These methods are based on different agile processes but they

share a set

of principles, based on the agile manifesto, and so they have much in

common. I

have listed these principles in Figure 3.2.

Agile methods have been particularly successful for two kinds of system

development.

1. Product development where a software company is developing a small

or

medium-sized product for sale. Virtually all software products and apps

are now

developed using an agile approach.

2. Custom system development within an organization, where there is a

clear com-

mitment from the customer to become involved in the development

process and

where there are few external stakeholders and regulations that affect the

software.

Agile methods work well in these situations because it is possible to have

con-

tinuous communications between the product manager or system

customer and the

development team. The software itself is a stand-alone system rather than

tightly

integrated with other systems being developed at the same time.

Consequently, there

is no need to coordinate parallel development streams. Small and medium-

sized

†http://agilemanifesto.org/

3.2 Agile development techniques 77

Select user

Break down

stories for this

Plan release

stories to tasks

release

Evaluate

Release

Develop/integrate/

system

software

test software

Figure 3.3 The XP

release cycle

systems can be developed by co-located teams, so informal

communications among

team members work well.

3.2 Agile development techniques

The ideas underlying agile methods were developed around the same time

by a number

of different people in the 1990s. However, perhaps the most significant

approach to

changing software development culture was the development of Extreme

Programming

(XP). The name was coined by Kent Beck (Beck 1998) because the

approach was

developed by pushing recognized good practice, such as iterative

development, to

“extreme” levels. For example, in XP, several new versions of a system

may be devel-

oped by different programmers, integrated, and tested in a day. Figure 3.3

illustrates the XP process to produce an increment of the system that is

being developed.

In XP, requirements are expressed as scenarios (called user stories), which

are

implemented directly as a series of tasks. Programmers work in pairs and

develop

tests for each task before writing the code. All tests must be successfully

executed when new code is integrated into the system. There is a short

time gap between

releases of the system.

Extreme programming was controversial as it introduced a number of

agile prac-

tices that were quite different from the development practice of that time.

These practices are summarized in Figure 3.4 and reflect the principles of

the agile manifesto: 1. Incremental development is supported through

small, frequent releases of the system. Requirements are based on simple

customer stories or scenarios that are used

as a basis for deciding what functionality should be included in a system

increment.

2. Customer involvement is supported through the continuous engagement

of the

customer in the development team. The customer representative takes part

in

the development and is responsible for defining acceptance tests for the

system.

3. People, not process, are supported through pair programming,

collective owner-

ship of the system code, and a sustainable development process that does

not

involve excessively long working hours.

78 Chapter 3 Agile software development

Principle or practice

Description

Collective ownership

The pairs of developers work on all areas of the system, so that no islands

of

expertise develop and all the developers take responsibility for all of the

code.

Anyone can change anything.

Continuous

As soon as the work on a task is complete, it is integrated into the whole

integration

system. After any such integration, all the unit tests in the system must

pass.

Incremental planning

Requirements are recorded on “story cards,” and the stories to be included

in

a release are determined by the time available and their relative priority.

The

developers break these stories into development “tasks.” See Figures 3.5

and 3.6.

On-site customer

A representative of the end-user of the system (the Customer) should be

available full time for the use of the XP team. In an extreme programming

process, the customer is a member of the development team and is

responsible for bringing system requirements to the team for

implementation.

Pair programming

Developers work in pairs, checking each other's work and providing the

support to always do a good job.

Refactoring

All developers are expected to refactor the code continuously as soon as

potential code improvements are found. This keeps the code simple and

maintainable.

Simple design

Enough design is carried out to meet the current requirements and no

more.

Small releases

The minimal useful set of functionality that provides business value is

developed first. Releases of the system are frequent and incrementally add

functionality to the first release.

Sustainable pace

Large amounts of overtime are not considered acceptable, as the net effect

is

often to reduce code quality and medium-term productivity.

Test first

An automated unit test framework is used to write tests for a new piece of

development

functionality before that functionality itself is implemented.

Figure 3.4 Extreme

programming practices 4. Change is embraced through regular system

releases to customers, test-first development, refactoring to avoid code

degeneration, and continuous integration of new functionality.

5. Maintaining simplicity is supported by constant refactoring that

improves code

quality and by using simple designs that do not unnecessarily anticipate

future

changes to the system.

In practice, the application of Extreme Programming as originally

proposed has

proved to be more difficult than anticipated. It cannot be readily

integrated with the management practices and culture of most businesses.

Therefore, companies adopting agile methods pick and choose those XP

practices that are most appropriate for

their way of working. Sometimes these are incorporated into their own

development

processes but, more commonly, they are used in conjunction with a

management-

focused agile method such as Scrum (Rubin 2013).

3.2 Agile development techniques 79

Prescribing medication

Kate is a doctor who wishes to prescribe medication for a patient

attending a clinic.

The patient record is already displayed on her computer so she clicks on

the

medication field and can select ‘current medication’, ‘new medication’ or

‘formulary’.

If she selects ‘current medication’, the system asks her to check the dose; If

she

wants to change the dose, she enters the new dose then confirms the

prescription.

If she chooses ‘new medication’, the system assumes that she knows which

medication to prescribe. She types the first few letters of the drug name.

The system displays a list of possible drugs starting with these letters. She

chooses the required medication and the system responds by asking her to

check that the medication

selected is correct. She enters the dose then confirms the prescription.

If she chooses ‘formulary’, the system displays a search box for the

approved

formulary. She can then search for the drug required. She selects a drug

and is asked to check that the medication is correct. She enters the dose

then confirms the

prescription.

The system always checks that the dose is within the approved range. If it

isn’t, Kate is asked to change the dose.

After Kate has confirmed the prescription, it will be displayed for

checking. She either Figure 3.5 A

clicks ‘OK’ or ‘Change’. If she clicks ‘OK’, the prescription is recorded on

the audit

“prescribing medication”

database. If she clicks on ‘Change’, she reenters the ‘Prescribing

medication’ process.

story

I am not convinced that XP on its own is a practical agile method for most

com-

panies, but its most significant contribution is probably the set of agile

development practices that it introduced to the community. I discuss the

most important of these practices in this section.

3.2.1 User stories

Software requirements always change. To handle these changes, agile

methods do not

have a separate requirements engineering activity. Rather, they integrate

requirements elicitation with development. To make this easier, the idea

of “user stories” was developed where a user story is a scenario of use that

might be experienced by a system user.

As far as possible, the system customer works closely with the

development team

and discusses these scenarios with other team members. Together, they

develop a

“story card” that briefly describes a story that encapsulates the customer

needs. The development team then aims to implement that scenario in a

future release of the

software. An example of a story card for the Mentcare system is shown in

Figure 3.5.

This is a short description of a scenario for prescribing medication for a

patient.

User stories may be used in planning system iterations. Once the story

cards have

been developed, the development team breaks these down into tasks

(Figure 3.6) and

estimates the effort and resources required for implementing each task.

This usually involves discussions with the customer to refine the

requirements. The customer

then prioritizes the stories for implementation, choosing those stories that

can be

80 Chapter 3 Agile software development

Task 1: Change dose of prescribed drug

Task 2: Formulary selection

Task 3: Dose checking

Dose checking is a safety precaution to check that

the doctor has not prescribed a dangerously small or

large dose.

Using the formulary id for the generic drug name,

look up the formulary and retrieve the recommended

maximum and minimum dose.

Check the prescribed dose against the minimum and

maximum. If outside the range, issue an error

Figure 3.6 Examples of

message saying that the dose is too high or too low.

task cards for prescribing

If within the range, enable the ‘Confirm’ button.

medication

used immediately to deliver useful business support. The intention is to

identify

useful functionality that can be implemented in about two weeks, when

the next

release of the system is made available to the customer.

Of course, as requirements change, the unimplemented stories change or

may be

discarded. If changes are required for a system that has already been

delivered, new story cards are developed and again, the customer decides

whether these changes

should have priority over new functionality.

The idea of user stories is a powerful one—people find it much easier to

relate to

these stories than to a conventional requirements document or use cases.

User stories can be helpful in getting users involved in suggesting

requirements during an initial prede-velopment requirements elicitation

activity. I discuss this in more detail in Chapter 4.

The principal problem with user stories is completeness. It is difficult to

judge if enough user stories have been developed to cover all of the

essential requirements

of a system. It is also difficult to judge if a single story gives a true picture

of an activity. Experienced users are often so familiar with their work that

they leave

things out when describing it.

3.2.2 Refactoring

A fundamental precept of traditional software engineering is that you

should design

for change. That is, you should anticipate future changes to the software

and design it so that these changes can be easily implemented. Extreme

programming, however,

has discarded this principle on the basis that designing for change is often

wasted

effort. It isn’t worth taking time to add generality to a program to cope

with change.

Often the changes anticipated never materialize, or completely different

change

requests may actually be made.

Of course, in practice, changes will always have to be made to the code

being devel-

oped. To make these changes easier, the developers of XP suggested that

the code being developed should be constantly refactored. Refactoring

(Fowler et al. 1999) means that the programming team look for possible

improvements to the software and implements

3.2 Agile development techniques 81

them immediately. When team members see code that can be improved,

they make

these improvements even in situations where there is no immediate need

for them.

A fundamental problem of incremental development is that local changes

tend to

degrade the software structure. Consequently, further changes to the

software become harder and harder to implement. Essentially, the

development proceeds by finding

workarounds to problems, with the result that code is often duplicated,

parts of the software are reused in inappropriate ways, and the overall

structure degrades as code is added to the system. Refactoring improves

the software structure and readability and so avoids the structural

deterioration that naturally occurs when software is changed.

Examples of refactoring include the reorganization of a class hierarchy to

remove

duplicate code, the tidying up and renaming of attributes and methods,

and the

replacement of similar code sections, with calls to methods defined in a

program

library. Program development environments usually include tools for

refactoring.

These simplify the process of finding dependencies between code sections

and mak-

ing global code modifications.

In principle, when refactoring is part of the development process, the

software

should always be easy to understand and change as new requirements are

proposed.

In practice, this is not always the case. Sometimes development pressure

means that

refactoring is delayed because the time is devoted to the implementation

of new

functionality. Some new features and changes cannot readily be

accommodated by

code-level refactoring and require that the architecture of the system be

modified.

3.2.3 Test-first development

As I discussed in the introduction to this chapter, one of the important

differences between incremental development and plan-driven

development is in the way that

the system is tested. With incremental development, there is no system

specification that can be used by an external testing team to develop

system tests. As a consequence, some approaches to incremental

development have a very informal testing

process, in comparison with plan-driven testing.

Extreme Programming developed a new approach to program testing to

address

the difficulties of testing without a specification. Testing is automated and

is central to the development process, and development cannot proceed

until all tests have

been successfully executed. The key features of testing in XP are:

1. test-first development,

2. incremental test development from scenarios,

3. user involvement in the test development and validation, and

4. the use of automated testing frameworks.

XP’s test-first philosophy has now evolved into more general test-driven

develop-

ment techniques (Jeffries and Melnik 2007). I believe that test-driven

development is one of the most important innovations in software

engineering. Instead of writing code and then writing tests for that code,

you write the tests before you write the code. This

82 Chapter 3 Agile software development

Test 4: Dose checking

Input:

1. A number in mg representing a single dose of the drug.

2. A number representing the number of single doses per day.

Tests:

1. Test for inputs where the single dose is correct but the frequency is too

high.

2. Test for inputs where the single dose is too high and too low.

3. Test for inputs where the single dose * frequency is too high and too

low.

4. Test for inputs where single dose * frequency is in the permitted range.

Figure 3.7 Test case

Output:

description for dose

OK or error message indicating that the dose is outside the safe range.

checking

means that you can run the test as the code is being written and discover

problems during development. I discuss test-driven development in more

depth in Chapter 8.

Writing tests implicitly defines both an interface and a specification of

behavior for the functionality being developed. Problems of requirements

and interface misunderstandings are reduced. Test-first development

requires there to be a clear relationship between system requirements and

the code implementing the corresponding requirements. In XP, this

relationship is clear because the story cards representing the requirements

are broken down into tasks and the tasks are the principal unit of

implementation.

In test-first development, the task implementers have to thoroughly

understand

the specification so that they can write tests for the system. This means

that ambi-

guities and omissions in the specification have to be clarified before

implementation begins. Furthermore, it also avoids the problem of “test-

lag.” This may happen when

the developer of the system works at a faster pace than the tester. The

implementa-

tion gets further and further ahead of the testing and there is a tendency

to skip tests, so that the development schedule can be maintained.

XP’s test-first approach assumes that user stories have been developed,

and these

have been broken down into a set of task cards, as shown in Figure 3.6.

Each task

generates one or more unit tests that check the implementation described

in that task.

Figure 3.7 is a shortened description of a test case that has been developed

to check that the prescribed dose of a drug does not fall outside known

safe limits.

The role of the customer in the testing process is to help develop

acceptance tests

for the stories that are to be implemented in the next release of the

system. As I

explain in Chapter 8, acceptance testing is the process whereby the system

is tested using customer data to check that it meets the customer’s real

needs.

Test automation is essential for test-first development. Tests are written as

exe-

cutable components before the task is implemented. These testing

components

should be stand-alone, should simulate the submission of input to be

tested, and

should check that the result meets the output specification. An automated

test framework is a system that makes it easy to write executable tests and

submit a set of tests for execution. Junit (Tahchiev et al. 2010) is a widely

used example of an automated testing framework for Java programs.

3.2 Agile development techniques 83

As testing is automated, there is always a set of tests that can be quickly

and eas-

ily executed. Whenever any functionality is added to the system, the tests

can be run and problems that the new code has introduced can be caught

immediately.

Test-first development and automated testing usually result in a large

number of

tests being written and executed. However, there are problems in ensuring

that test

coverage is complete:

1. Programmers prefer programming to testing, and sometimes they take

shortcuts

when writing tests. For example, they may write incomplete tests that do

not

check for all possible exceptions that may occur.

2. Some tests can be very difficult to write incrementally. For example, in

a com-

plex user interface, it is often difficult to write unit tests for the code that

implements the “display logic” and workflow between screens.

It is difficult to judge the completeness of a set of tests. Although you may

have a lot of system tests, your test set may not provide complete

coverage. Crucial parts of

the system may not be executed and so will remain untested. Therefore,

although a

large set of frequently executed tests may give the impression that the

system is complete and correct, this may not be the case. If the tests are

not reviewed and further tests are written after development, then

undetected bugs may be delivered in the system release.

3.2.4 Pair programming

Another innovative practice that was introduced in XP is that

programmers work in

pairs to develop the software. The programming pair sits at the same

computer to

develop the software. However, the same pair do not always program

together.

Rather, pairs are created dynamically so that all team members work with

each other

during the development process.

Pair programming has a number of advantages.

1. It supports the idea of collective ownership and responsibility for the

system.

This reflects Weinberg’s idea of egoless programming (Weinberg 1971)

where

the software is owned by the team as a whole and individuals are not held

responsible for problems with the code. Instead, the team has collective

respon-

sibility for resolving these problems.

2. It acts as an informal review process because each line of code is looked

at by at least two people. Code inspections and reviews (Chapter 24) are

effective in discovering

a high percentage of software errors. However, they are time consuming

to organize

and, typically, introduce delays into the development process. Pair

programming is a less formal process that probably doesn’t find as many

errors as code inspections.

However, it is cheaper and easier to organize than formal program

inspections.

3. It encourages refactoring to improve the software structure. The

problem with asking programmers to refactor in a normal development

environment is that effort

84 Chapter 3 Agile software development

involved is expended for long-term benefit. An developer who spends time

refac-

toring may be judged to be less efficient than one who simply carries on

developing

code. Where pair programming and collective ownership are used, others

benefit

immediately from the refactoring so they are likely to support the process.

You might think that pair programming would be less efficient than

individual

programming. In a given time, a pair of developers would produce half as

much code

as two individuals working alone. Many companies that have adopted

agile methods

are suspicious of pair programming and do not use it. Other companies

mix pair and

individual programming with an experienced programmer working with a

less expe-

rienced colleague when they have problems.

Formal studies of the value of pair programming have had mixed results.

Using

student volunteers, Williams and her collaborators (Williams et al. 2000)

found that productivity with pair programming seems to be comparable to

that of two people

working independently. The reasons suggested are that pairs discuss the

software

before development and so probably have fewer false starts and less

rework.

Furthermore, the number of errors avoided by the informal inspection is

such that

less time is spent repairing bugs discovered during the testing process.

However, studies with more experienced programmers did not replicate

these

results (Arisholm et al. 2007). They found that there was a significant loss

of productivity compared with two programmers working alone. There

were some quality

benefits, but these did not fully compensate for the pair-programming

overhead.

Nevertheless, the sharing of knowledge that happens during pair

programming is

very important as it reduces the overall risks to a project when team

members leave.

In itself, this may make pair programming worthwhile.

3.3 Agile project management

In any software business, managers need to know what is going on and

whether or not

a project is likely to meet its objectives and deliver the software on time

with the proposed budget. Plan-driven approaches to software

development evolved to meet this

need. As I discussed in Chapter 23, managers draw up a plan for the

project showing what should be delivered, when it should be delivered,

and who will work on the development of the project deliverables. A plan-

based approach requires a manager to have a stable view of everything

that has to be developed and the development processes.

The informal planning and project control that was proposed by the early

adher-

ents of agile methods clashed with this business requirement for visibility.

Teams

were self-organizing, did not produce documentation, and planned

development in

very short cycles. While this can and does work for small companies

developing

software products, it is inappropriate for larger companies who need to

know what is going on in their organization.

Like every other professional software development process, agile

development

has to be managed so that the best use is made of the time and resources

available to

3.3 Agile project management 85

Scrum term

Definition

Development team

A self-organizing group of software developers, which should be no

more than seven people. They are responsible for developing the

software and other essential project documents.

Potentially shippable product

The software increment that is delivered from a sprint. The idea is that

increment

this should be “potentially shippable,” which means that it is in a

finished state and no further work, such as testing, is needed to

incorporate it into the final product. In practice, this is not always

achievable.

Product backlog

This is a list of “to do” items that the Scrum team must tackle. They

may be feature definitions for the software, software requirements, user

stories, or descriptions of supplementary tasks that are needed, such as

architecture definition or user documentation.

Product owner

An individual (or possibly a small group) whose job is to identify

product features or requirements, prioritize these for development, and

continuously review the product backlog to ensure that the project

continues to meet critical business needs. The Product Owner can be a

customer but might also be a product manager in a software company

or other stakeholder representative.

Scrum

A daily meeting of the Scrum team that reviews progress and prioritizes

work to be done that day. Ideally, this should be a short face-to-face

meeting that includes the whole team.

ScrumMaster

The ScrumMaster is responsible for ensuring that the Scrum process is

followed and guides the team in the effective use of Scrum. He or she

is responsible for interfacing with the rest of the company and for

ensuring that the Scrum team is not diverted by outside interference.

The Scrum developers are adamant that the ScrumMaster should not

be thought of as a project manager. Others, however, may not always

find it easy to see the difference.

Sprint

A development iteration. Sprints are usually 2 to 4 weeks long.

Velocity

An estimate of how much product backlog effort a team can cover in a

single sprint. Understanding a team’s velocity helps them estimate what

can be covered in a sprint and provides a basis for measuring

improving performance.

Figure 3.8 Scrum

terminology

the team. To address this issue, the Scrum agile method was developed

(Schwaber

and Beedle 2001; Rubin 2013) to provide a framework for organizing

agile projects

and, to some extent at least, provide external visibility of what is going on.

The developers of Scrum wished to make clear that Scrum was not a

method for project man-

agement in the conventional sense, so they deliberately invented new

terminology,

such as ScrumMaster, which replaced names such as project manager.

Figure 3.8

summarizes Scrum terminology and what it means.

Scrum is an agile method insofar as it follows the principles from the agile

mani-

festo, which I showed in Figure 3.2. However, it focuses on providing a

framework

for agile project organization, and it does not mandate the use of specific

development

86 Chapter 3 Agile software development

Scrum

Review work

Select

Plan

Review

to be done

items

sprint

Sprint

sprint

Product

Sprint

Potentially

backlog

backlog

shippable

Figure 3.9 The Scrum

software

sprint cycle

practices such as pair programming and test-first development. This means

that it

can be more easily integrated with existing practice in a company.

Consequently, as

agile methods have become a mainstream approach to software

development, Scrum

has emerged as the most widely used method.

The Scrum process or sprint cycle is shown in Figure 3.9. The input to the

process

is the product backlog. Each process iteration produces a product

increment that

could be delivered to customers.

The starting point for the Scrum sprint cycle is the product backlog—the

list of

items such as product features, requirements, and engineering

improvement that

have to be worked on by the Scrum team. The initial version of the

product backlog

may be derived from a requirements document, a list of user stories, or

other description of the software to be developed.

While the majority of entries in the product backlog are concerned with

the imple-

mentation of system features, other activities may also be included.

Sometimes, when planning an iteration, questions that cannot be easily

answered come to light and additional work is required to explore possible

solutions. The team may carry out some prototyping or trial development

to understand the problem and solution. There may also be backlog items

to design the system architecture or to develop system documentation.

The product backlog may be specified at varying levels of detail, and it is

the

responsibility of the Product Owner to ensure that the level of detail in the

specification is appropriate for the work to be done. For example, a

backlog item could be a

complete user story such as that shown in Figure 3.5, or it could simply be

an instruction such as “Refactor user interface code” that leaves it up to

the team to decide on the refactoring to be done.

Each sprint cycle lasts a fixed length of time, which is usually between 2

and 4 weeks.

At the beginning of each cycle, the Product Owner prioritizes the items on

the product backlog to define which are the most important items to be

developed in that cycle.

Sprints are never extended to take account of unfinished work. Items are

returned to the product backlog if these cannot be completed within the

allocated time for the sprint.

The whole team is then involved in selecting which of the highest priority

items

they believe can be completed. They then estimate the time required to

complete

these items. To make these estimates, they use the velocity attained in

previous

3.3 Agile project management 87

sprints, that is, how much of the backlog could be covered in a single

sprint. This

leads to the creation of a sprint backlog—the work to be done during that

sprint. The team self-organizes to decide who will work on what, and the

sprint begins.

During the sprint, the team holds short daily meetings (Scrums) to review

pro-

gress and, where necessary, to re-prioritize work. During the Scrum, all

team mem-

bers share information, describe their progress since the last meeting,

bring up

problems that have arisen, and state what is planned for the following

day. Thus,

everyone on the team knows what is going on and, if problems arise, can

re-plan

short-term work to cope with them. Everyone participates in this short-

term plan-

ning; there is no top-down direction from the ScrumMaster.

The daily interactions among Scrum teams may be coordinated using a

Scrum

board. This is an office whiteboard that includes information and post-it

notes about the Sprint backlog, work done, unavailability of staff, and so

on. This is a shared

resource for the whole team, and anyone can change or move items on the

board. It

means that any team member can, at a glance, see what others are doing

and what

work remains to be done.

At the end of each sprint, there is a review meeting, which involves the

whole

team. This meeting has two purposes. First, it is a means of process

improvement.

The team reviews the way they have worked and reflects on how things

could have

been done better. Second, it provides input on the product and the

product state for the product backlog review that precedes the next sprint.

While the ScrumMaster is not formally a project manager, in practice

ScrumMasters

take this role in many organizations that have a conventional management

structure.

They report on progress to senior management and are involved in longer-

term plan-

ning and project budgeting. They may be involved in project

administration (agreeing on holidays for staff, liaising with HR, etc.) and

hardware and software purchases.

In various Scrum success stories (Schatz and Abdelshafi 2005; Mulder and

van

Vliet 2008; Bellouiti 2009), the things that users like about the Scrum

method are:

1. The product is broken down into a set of manageable and

understandable chunks

that stakeholders can relate to.

2. Unstable requirements do not hold up progress.

3. The whole team has visibility of everything, and consequently team

communi-

cation and morale are improved.

4. Customers see on-time delivery of increments and gain feedback on

how the

product works. They are not faced with last-minute surprises when a team

announces that software will not be delivered as expected.

5. Trust between customers and developers is established, and a positive

culture is created in which everyone expects the project to succeed.

Scrum, as originally designed, was intended for use with co-located teams

where

all team members could get together every day in stand-up meetings.

However,

much software development now involves distributed teams, with team

members

located in different places around the world. This allows companies to

take advantage

88 Chapter 3 Agile software development

The ScrumMaster

should be located with

the development team

so that he or she is

The Product Owner

aware of everyday

should visit the

Videoconferencing

problems.

developers and try to

between the product

establish a good

owner and the

relationship with them.

development team

It is essential that they

trust each other.

Distributed Scrum

Real-time communica-

A common development

tions between team

environment for all teams

members for informal

communication,

Continuous integration,

particularly instant

so that all team

messaging and video

members can be aware

calls.

Figure 3.10 Distributed

of the state of the

Scrum

product at any time.

of lower cost staff in other countries, makes access to specialist skills

possible, and allows for 24-hour development, with work going on in

different time zones.

Consequently, there have been developments of Scrum for distributed

development

environments and multi-team working. Typically, for offshore

development, the prod-

uct owner is in a different country from the development team, which may

also be

distributed. Figure 3.10 shows the requirements for Distributed Scrum

(Deemer 2011).

3.4 Scaling agile methods

Agile methods were developed for use by small programming teams that

could work

together in the same room and communicate informally. They were

originally used

by for the development of small and medium-sized systems and software

products.

Small companies, without formal processes or bureaucracy, were

enthusiastic initial

adopters of these methods.

Of course, the need for faster delivery of software, which is more suited to

cus-

tomer needs, also applies to both larger systems and larger companies.

Consequently, over the past few years, a lot of work has been put into

evolving agile methods for

both large software systems and for use in large companies.

Scaling agile methods has closely related facets:

1. Scaling up these methods to handle the development of large systems

that are

too big to be developed by a single small team.

2. Scaling out these methods from specialized development teams to more

widespread

use in a large company that has many years of software development

experience.

3.4 Scaling agile methods 89

Of course, scaling up and scaling out are closely related. Contracts to

develop

large software systems are usually awarded to large organizations, with

multiple

teams working on the development project. These large companies have

often exper-

imented with agile methods in smaller projects, so they face the problems

of scaling up and scaling out at the same time.

There are many anecdotes about the effectiveness of agile methods, and it

has

been suggested that these can lead to orders of magnitude improvements

in produc-

tivity and comparable reductions in defects. Ambler (Ambler 2010), an

influential

agile method developer, suggests that these productivity improvements

are exagger-

ated for large systems and organizations. He suggests that an organization

moving to agile methods can expect to see productivity improvement

across the organization of

about 15% over 3 years, with similar reductions in the number of product

defects.

3.4.1 Practical problems with agile methods

In some areas, particularly in the development of software products and

apps, agile

development has been incredibly successful. It is by far the best approach

to use for this type of system. However, agile methods may not be suitable

for other types of

software development, such as embedded systems engineering or the

development

of large and complex systems.

For large, long-lifetime systems that are developed by a software company

for an

external client, using an agile approach presents a number of problems.

1. The informality of agile development is incompatible with the legal

approach to

contract definition that is commonly used in large companies.

2. Agile methods are most appropriate for new software development

rather than

for software maintenance. Yet the majority of software costs in large

companies

come from maintaining their existing software systems.

3. Agile methods are designed for small co-located teams, yet much

software

development now involves worldwide distributed teams.

Contractual issues can be a major problem when agile methods are used.

When

the system customer uses an outside organization for system development,

a contract

for the software development is drawn up between them. The software

requirements

document is usually part of that contract between the customer and the

supplier.

Because the interleaved development of requirements and code is

fundamental to

agile methods, there is no definitive statement of requirements that can be

included in the contract.

Consequently, agile methods have to rely on contracts in which the

customer

pays for the time required for system development rather than the

development of a

specific set of requirements. As long as all goes well, this benefits both the

customer and the developer. However, if problems arise, then there may

be difficult disputes

over who is to blame and who should pay for the extra time and resources

required

to resolve the problems.

90 Chapter 3 Agile software development

As I explain in Chapter 9, a huge amount of software engineering effort

goes into the maintenance and evolution of existing software systems.

Agile practices, such as incremental delivery, design for change, and

maintaining simplicity all make sense when software is being changed. In

fact, you can think of an agile development process as a process that

supports continual change. If agile methods are used for software product

development, new releases of the product or app simply involve

continuing the agile approach.

However, where maintenance involves a custom system that must be

changed in

response to new business requirements, there is no clear consensus on the

suitability of agile methods for software maintenance (Bird 2011; Kilner

2012). Three types of

problems can arise:

lack of product documentation

keeping customers involved

development team continuity

Formal documentation is supposed to describe the system and so make it

easier

for people changing the system to understand. In practice, however,

formal docu-

mentation is rarely updated and so does not accurately reflect the program

code. For this reason, agile methods enthusiasts argue that it is a waste of

time to write this documentation and that the key to implementing

maintainable software is to produce

high-quality, readable code. The lack of documentation should not be a

problem in

maintaining systems developed using an agile approach.

However, my experience of system maintenance is that the most important

docu-

ment is the system requirements document, which tells the software

engineer what the system is supposed to do. Without such knowledge, it is

difficult to assess the impact of proposed system changes. Many agile

methods collect requirements informally and

incrementally and do not create a coherent requirements document. The

use of agile

methods may therefore make subsequent system maintenance more

difficult and expen-

sive. This is a particular problem if development team continuity cannot

be maintained.

A key challenge in using an agile approach to maintenance is keeping

customers

involved in the process. While a customer may be able to justify the full-

time involvement of a representative during system development, this is

less likely during maintenance where changes are not continuous.

Customer representatives are likely to lose

interest in the system. Therefore, it is likely that alternative mechanisms,

such as change proposals, discussed in Chapter 25, will have to be adapted

to fit in with an agile approach.

Another potential problem that may arise is maintaining continuity of the

devel-

opment team. Agile methods rely on team members understanding aspects

of the

system without having to consult documentation. If an agile development

team is

broken up, then this implicit knowledge is lost and it is difficult for new

team members to build up the same understanding of the system and its

components. Many

programmers prefer to work on new development to software

maintenance, and so

they are unwilling to continue to work on a software system after the first

release has been delivered. Therefore, even when the intention is to keep

the development team

together, people leave if they are assigned maintenance tasks.

3.4 Scaling agile methods 91

Principle

Practice

Customer involvement

This depends on having a customer who is willing and able to spend time

with

the development team and who can represent all system stakeholders.

Often,

customer representatives have other demands on their time and cannot

play a

full part in the software development. Where there are external

stakeholders,

such as regulators, it is difficult to represent their views to the agile team.

Embrace change

Prioritizing changes can be extremely difficult, especially in systems for

which

there are many stakeholders. Typically, each stakeholder gives different

priorities to different changes.

Incremental delivery

Rapid iterations and short-term planning for development does not always

fit

in with the longer-term planning cycles of business planning and

marketing.

Marketing managers may need to know product features several months in

advance to prepare an effective marketing campaign.

Maintain simplicity

Under pressure from delivery schedules, team members may not have time

to

carry out desirable system simplifications.

People, not process

Individual team members may not have suitable personalities for the

intense

involvement that is typical of agile methods and therefore may not

interact

well with other team members.

Figure 3.11 Agile

principles and

organizational practice

3.4.2 Agile and plan-driven methods

A fundamental requirement of scaling agile methods is to integrate them

with plan-

driven approaches. Small startup companies can work with informal and

short-term

planning, but larger companies have to have longer-term plans and

budgets for

investment, staffing, and business development. Their software

development must

support these plans, so longer-term software planning is essential.

Early adopters of agile methods in the first decade of the 21st century

were enthu-

siasts and deeply committed to the agile manifesto. They deliberately

rejected the

plan-driven approach to software engineering and were reluctant to

change the ini-

tial vision of agile methods in any way. However, as organizations saw the

value and benefits of an agile approach, they adapted these methods to

suit their own culture

and ways of working. They had to do this because the principles

underlying agile

methods are sometimes difficult to realize in practice (Figure 3.11).

To address these problems, most large “agile” software development

projects com-

bine practices from plan-driven and agile approaches. Some are mostly

agile, and others are mostly plan-driven but with some agile practices. To

decide on the balance between a plan-based and an agile approach, you

have to answer a range of technical, human and organizational questions.

These relate to the system being developed, the development team, and

the organizations that are developing and procuring the system (Figure

3.12).

Agile methods were developed and refined in projects to develop small to

medium-

sized business systems and software products, where the software

developer controls

the specification of the system. Other types of system have attributes such

as size, complexity, real-time response, and external regulation that mean

a “pure” agile approach is

92 Chapter 3 Agile software development

System

Team

Organization

Type

Lifetime

Technology

Distribution

Contracts

Delivery

Figure 3.12 Factors

influencing the choice

of plan-based or agile

Scale

Regulation

Competence

Culture

development

unlikely to work. There needs to be some up-front planning, design, and

documentation in the systems engineering process. Some of the key issues

are as follows:

1. How large is the system that is being developed? Agile methods are

most effective when the system can be developed with a relatively small

co-located team who

can communicate informally. This may not be possible for large systems

that

require larger development teams, so a plan-driven approach may have to

be used.

2. What type of system is being developed? Systems that require a lot of

analysis

before implementation (e.g., real-time system with complex timing

require-

ments) usually need a fairly detailed design to carry out this analysis. A

plan-

driven approach may be best in those circumstances.

3. What is the expected system lifetime? Long-lifetime systems may

require more

design documentation to communicate the original intentions of the

system

developers to the support team. However, supporters of agile methods

rightly

argue that documentation is frequently not kept up to date and is not of

much

use for long-term system maintenance.

4. Is the system subject to external regulation? If a system has to be

approved

by an external regulator (e.g., the Federal Aviation Administration

approves

software that is critical to the operation of an aircraft), then you will

probably be required to produce detailed documentation as part of the

system safety case.

Agile methods place a great deal of responsibility on the development

team to

cooperate and communicate during the development of the system. They

rely on indi-

vidual engineering skills and software support for the development

process. However, in reality, not everyone is a highly skilled engineer,

people do not communicate effectively, and it is not always possible for

teams to work together. Some planning may be required to make the most

effective use of the people available. Key issues are:

1. How good are the designers and programmers in the development

team?

It is sometimes argued that agile methods require higher skill levels than

plan-

based approaches in which programmers simply translate a detailed

design into

code. If you have a team with relatively low skill levels, you may need to

use

the best people to develop the design, with others responsible for

programming.

3.4 Scaling agile methods 93

2. How is the development team organized? If the development team is

distributed

or if part of the development is being outsourced, then you may need to

develop

design documents to communicate across the development teams.

3. What technologies are available to support system development? Agile

methods

often rely on good tools to keep track of an evolving design. If you are

develop-

ing a system using an IDE that does not have good tools for program

visualiza-

tion and analysis, then more design documentation may be required.

Television and films have created a popular vision of software companies

as

informal organizations run by young men (mostly) who provide a

fashionable work-

ing environment, with a minimum of bureaucracy and organizational

procedures.

This is far from the truth. Most software is developed in large companies

that have

established their own working practices and procedures. Management in

these

companies may be uncomfortable with the lack of documentation and the

informal

decision making in agile methods. Key issues are:

1. Is it important to have a very detailed specification and design before

moving to implementation, perhaps for contractual reasons? If so, you

probably need to

use a plan-driven approach for requirements engineering but may use

agile

development practices during system implementation.

2. Is an incremental delivery strategy, where you deliver the software to

customers or other system stakeholders and get rapid feedback from them,

realistic? Will

customer representatives be available, and are they willing to participate

in the

development team?

3. Are there cultural issues that may affect system development?

Traditional engi-

neering organizations have a culture of plan-based development, as this is

the

norm in engineering. This usually requires extensive design

documentation

rather than the informal knowledge used in agile processes.

In reality, the issue of whether a project can be labeled as plan-driven or

agile

is not very important. Ultimately, the primary concern of buyers of a

software system is whether or not they have an executable software

system that meets their needs and does useful things for the individual

user or the organization. Software developers

should be pragmatic and should choose those methods that are most

effective for the

type of system being developed, whether or not these are labeled agile or

plan-driven.

3.4.3 Agile methods for large systems

Agile methods have to evolve to be used for large-scale software

development.

The fundamental reason for this is that large-scale software systems are

much

more complex and difficult to understand and manage than small-scale

systems

or software products. Six principal factors (Figure 3.13) contribute to this

complexity:

94 Chapter 3 Agile software development

Brownfield

System of

development

Diverse

systems

stakeholders

Large software system

Prolonged

Regulatory

procurement

System

constraints

Figure 3.13 Large

configuration

project characteristics

1. Large systems are usually systems of systems—collections of separate,

com-

municating systems, where separate teams develop each system.

Frequently,

these teams are working in different places, sometimes in different time

zones.

It is practically impossible for each team to have a view of the whole

system.

Consequently, their priorities are usually to complete their part of the

system

without regard for wider systems issues.

2. Large systems are brownfield systems (Hopkins and Jenkins 2008); that

is, they

include and interact with a number of existing systems. Many of the

system require-

ments are concerned with this interaction and so don’t really lend

themselves to

flexibility and incremental development. Political issues can also be

significant

here—often the easiest solution to a problem is to change an existing

system.

However, this requires negotiation with the managers of that system to

convince

them that the changes can be implemented without risk to the system’s

operation.

3. Where several systems are integrated to create a system, a significant

fraction of the development is concerned with system configuration rather

than original

code development. This is not necessarily compatible with incremental

devel-

opment and frequent system integration.

4. Large systems and their development processes are often constrained by

exter-

nal rules and regulations limiting the way that they can be developed, that

require certain types of system documentation to be produced, and so on.

Customers may have specific compliance requirements that may have to

be fol-

lowed, and these may require process documentation to be completed.

5. Large systems have a long procurement and development time. It is

difficult to

maintain coherent teams who know about the system over that period as,

inevi-

tably, people move on to other jobs and projects.

6. Large systems usually have a diverse set of stakeholders with different

perspec-

tives and objectives. For example, nurses and administrators may be the

end-users

of a medical system, but senior medical staff, hospital managers, and

others, are

also stakeholders in the system. It is practically impossible to involve all of

these different stakeholders in the development process.

3.4 Scaling agile methods 95

Core agile development

Disciplined agile delivery

Value-driven life-cycle

Risk+value driven life-cycle

Self-organizing teams

Self-organizing with appropriate

Focus on construction

governance framework

Full delivery life-cycle

Agility at

scale

Agility at scale

Disciplined

Disciplined agile delivery where

agile delivery

scaling factors apply:

Large team size

Geographic distribution

Core agile

Regulatory compliance

development

Domain complexity

Organization distribution

Technical complexity

Organizational complexity

Enterprise discipline

Figure 3.14 IBM’s

Agility at Scale model

(© IBM 2010)

Dean Leffingwell, who has a great deal of experience in scaling agile

methods,

has developed the Scaled Agile Framework (Leffingwell 2007, 2011) to

support

large-scale, multi-team software development. He reports how this method

has been

used successfully in a number of large companies. IBM has also developed

a frame-

work for the large-scale use of agile methods called the Agile Scaling

Model (ASM).

Figure 3.14, taken from Ambler’s white paper that discusses ASM (Ambler

2010),

shows an overview of this model.

The ASM recognizes that scaling is a staged process where development

teams

move from the core agile practices discussed here to what is called

Disciplined Agile Delivery. Essentially, this stage involves adapting these

practices to a disciplined organizational setting and recognizing that teams

cannot simply focus on development but must also take into account other

stages of the software engineering

process, such as requirements and architectural design.

The final scaling stage in ASM is to move to Agility at Scale where the

com-

plexity that is inherent in large projects is recognized. This involves taking

account of factors such as distributed development, complex legacy

environments, and

regulatory compliance requirements. The practices used for disciplined

agile

delivery may have to be modified on a project-by-project basis to take

these into

account and, sometimes, additional plan-based practices added to the

process.

No single model is appropriate for all large-scale agile products as the type

of

product, the customer requirements, and the people available are all

different.

However, approaches to scaling agile methods have a number of things in

common:

96 Chapter 3 Agile software development

1. A completely incremental approach to requirements engineering is

impossible.

Some early work on initial software requirements is essential. You need

this

work to identify the different parts of the system that may be developed

by

different teams and, often, to be part of the contract for the system

development.

However, these requirements should not normally be specified in detail;

details

are best developed incrementally.

2. There cannot be a single product owner or customer representative.

Different

people have to be involved for different parts of the system, and they have

to

continuously communicate and negotiate throughout the development

process.

3. It is not possible to focus only on the code of the system. You need to

do more

up-front design and system documentation. The software architecture has

to be

designed, and there has to be documentation produced to describe critical

aspects of the system, such as database schemas and the work breakdown

across teams.

4. Cross-team communication mechanisms have to be designed and used.

This

should involve regular phone and videoconferences between team

members and

frequent, short electronic meetings where teams update each other on

progress.

A range of communication channels such as email, instant messaging,

wikis,

and social networking systems should be provided to facilitate

communications.

5. Continuous integration, where the whole system is built every time any

devel-

oper checks in a change, is practically impossible when several separate

programs have to be integrated to create the system. However, it is

essential

to maintain frequent system builds and regular releases of the system.

Configuration management tools that support multi-team software

develop-

ment are essential.

Scrum has been adapted for large-scale development. In essence, the

Scrum team

model described in Section 3.3 is maintained, but multiple Scrum teams

are set up.

The key characteristics of multi-team Scrum are:

1. Role replication Each team has a Product Owner for its work component

and ScrumMaster. There may be a chief Product Owner and ScrumMaster

for the

entire project.

2. Product architects Each team chooses a product architect, and these

architects collaborate to design and evolve the overall system architecture.

3. Release alignment The dates of product releases from each team are

aligned so that a demonstrable and complete system is produced.

4. Scrum of Scrums There is a daily Scrum of Scrums where representatives

from each team meet to discuss progress, identify problems, and plan the

work to be

done that day. Individual team Scrums may be staggered in time so that

repre-

sentatives from other teams can attend if necessary.

3.4 Scaling agile methods 97

3.4.4 Agile methods across organizations

Small software companies that develop software products have been

among the

most enthusiastic adopters of agile methods. These companies are not

constrained by

organizational bureaucracies or process standards, and they can change

quickly to

adopt new ideas. Of course, larger companies have also experimented with

agile

methods in specific projects, but it is much more difficult for them to

“scale out”

these methods across the organization.

It can be difficult to introduce agile methods into large companies for a

number of

reasons:

1. Project managers who do not have experience of agile methods may be

reluctant

to accept the risk of a new approach, as they do not know how this will

affect

their particular projects.

2. Large organizations often have quality procedures and standards that all

pro-

jects are expected to follow, and, because of their bureaucratic nature,

these are

likely to be incompatible with agile methods. Sometimes, these are

supported

by software tools (e.g., requirements management tools), and the use of

these

tools is mandated for all projects.

3. Agile methods seem to work best when team members have a relatively

high

skill level. However, within large organizations, there are likely to be a

wide

range of skills and abilities, and people with lower skill levels may not be

effec-

tive team members in agile processes.

4. There may be cultural resistance to agile methods, especially in those

organiza-

tions that have a long history of using conventional systems engineering

processes.

Change management and testing procedures are examples of company

procedures

that may not be compatible with agile methods. Change management is

the process of

controlling changes to a system, so that the impact of changes is

predictable and costs are controlled. All changes have to be approved in

advance before they are made, and this conflicts with the notion of

refactoring. When refactoring is part of an agile process, any developer

can improve any code without getting external approval. For large

systems, there are also testing standards where a system build is handed

over to an external testing team. This may conflict with test-first

approaches used in agile development methods.

Introducing and sustaining the use of agile methods across a large

organization is

a process of cultural change. Cultural change takes a long time to

implement and

often requires a change of management before it can be accomplished.

Companies

wishing to use agile methods need evangelists to promote change. Rather

than try-

ing to force agile methods onto unwilling developers, companies have

found that the

best way to introduce agile is bit by bit, starting with an enthusiastic

group of developers. A successful agile project can act as a starting point,

with the project team spreading agile practice across the organization.

Once the notion of agile is widely known, explicit actions can then be

taken to spread it across the organization.

98 Chapter 3 Agile software development

K e y P o i n t s

Agile methods are iterative development methods that focus on

reducing process overheads and documentation and on incremental

software delivery. They involve customer representatives directly in the

development process.

The decision on whether to use an agile or a plan-driven approach to

development should depend on the type of software being developed, the

capabilities of the development team, and the culture of the company

developing the system. In practice, a mix of agile and plan-based

techniques may be used.

Agile development practices include requirements expressed as user

stories, pair programming, refactoring, continuous integration, and test-

first development.

Scrum is an agile method that provides a framework for organizing

agile projects. It is centered around a set of sprints, which are fixed time

periods when a system increment is developed. Planning is based on

prioritizing a backlog of work and selecting the highest priority tasks for a

sprint.

To scale agile methods, some plan-based practices have to be integrated

with agile practice.

These include up-front requirements, multiple customer representatives,

more documentation, common tooling across project teams, and the

alignment of releases across teams.

F u r t h e r r e A d i n g

“Get Ready for Agile Methods, With Care.” A thoughtful critique of agile

methods that discusses their strengths and weaknesses, written by a vastly

experienced software engineer. Still very relevant, although almost 15

years old. (B. Boehm, IEEE Computer, January 2002) http://

dx.doi.org/10.1109/2.976920

Extreme Programming Explained. This was the first book on XP and is still,

perhaps, the most readable. It explains the approach from the perspective

of one of its inventors, and his enthusiasm comes through very clearly in

the book. (K. Beck and C. Andres, Addison-Wesley, 2004) Essential Scrum:

A Practical Guide to the Most Popular Agile Process. This is a comprehensive

and readable description of the 2011 development of the Scrum method

(K.S. Rubin, Addison-Wesley, 2013).

“Agility at Scale: Economic Governance, Measured Improvement and

Disciplined Delivery.” This paper discusses IBM's approach to scale agile

methods, where they have a systematic approach to integrating plan-based

and agile development. It is an excellent and thoughtful discussion of the

key issues in scaling agile (A.W. Brown, S.W. Ambler, and W. Royce, Proc.

35th Int. Conf. on Software Engineering, 2013) http://

dx.doi.org/10.1145/12944.12948

W e b S i t e

PowerPoint slides for this chapter:

www.pearsonglobaleditions.com/Sommerville

Links to supporting videos:

http://software-engineering-book.com/videos/agile-methods/

3.4 Agile development techniques

Chapter 3 References 99

e x e r C i S e S

3.1. At the end of their study program, students in a software engineering

course are typically expected to complete a major project. Explain how the

agile methodology may be very useful for the students to use in this case.

3.2. Explain how the principles underlying agile methods lead to the

accelerated development and deployment of software.

3.3. Extreme programming expresses user requirements as stories, with

each story written on a card. Discuss the advantages and disadvantages of

this approach to requirements description.

3.4. In test-first development, tests are written before the code. Explain

how the test suite may compromise the quality of the software system

being developed.

3.5. Suggest four reasons why the productivity rate of programmers

working as a pair might be more than half that of two programmers

working individually.

3.6. Compare and contrast the Scrum approach to project management

with conventional plan- based approaches as discussed in Chapter 23.

Your comparison should be based on the effectiveness of each approach

for planning the allocation of people to projects, estimating the cost of

projects, maintaining team cohesion, and managing changes in project

team membership.

3.7. To reduce costs and the environmental impact of commuting, your

company decides to close a number of offices and to provide support for

staff to work from home. However, the senior management who introduce

the policy are unaware that software is developed using Scrum.

Explain how you could use technology to support Scrum in a distributed

environment to make this possible. What problems are you likely to

encounter using this approach?

3.8. Why is it necessary to introduce some methods and documentation

from plan-based approaches when scaling agile methods to larger projects

that are developed by distributed development teams?

3.9. Explain why agile methods may not work well in organizations that

have teams with a wide range of skills and abilities and well-established

processes.

3.10. One of the problems of having a user closely involved with a

software development team is that they “go native.” That is, they adopt

the outlook of the development team and lose sight of the needs of their

user colleagues. Suggest three ways how you might avoid this problem,

and discuss the advantages and disadvantages of each approach.

r e F e r e n C e S

Ambler, S. W. 2010. “Scaling Agile: A Executive Guide.” http://

www.ibm.com/developerworks/

community/blogs/ambler/entry/scaling_agile_an_executive_guide10/

Arisholm, E., H. Gallis, T. Dyba, and D. I. K. Sjoberg. 2007. “Evaluating

Pair Programming with Respect to System Complexity and Programmer

Expertise.” IEEE Trans. on Software Eng. 33 (2): 65–86. doi:10.1109/

TSE.2007.17.

Beck, K. 1998. “Chrysler Goes to ‘Extremes.’” Distributed Computing (10):

24–28.

100

100 Chapter 3

Chapter Agile Software Development

Agile software development

. 1999. “Embracing Change with Extreme Programming.” IEEE Computer

32 (10): 70–78.

doi:10.1109/2.796139.

Bellouiti, S. 2009. “How Scrum Helped Our A-Team.” http://

www.scrumalliance.org/community/

articles/2009/2009-june/how-scrum-helped-our team

Bird, J. 2011. “You Can't Be Agile in Maintenance.” http://

swreflections.blogspot.co.uk/2011/10/

you-cant-be-agile-in-maintenance.html

Deemer, P. 2011. “The Distributed Scrum Primer.” http://

www.goodagile.com/distributedscrumprimer/.

Fowler, M., K. Beck, J. Brant, W. Opdyke, and D. Roberts. 1999.

Refactoring: Improving the Design of Existing Code. Boston: Addison-Wesley.

Hopkins, R., and K. Jenkins. 2008. Eating the IT Elephant: Moving from

Greenfield Development to Brownfield. Boston: IBM Press.

Jeffries, R., and G. Melnik. 2007. “TDD: The Art of Fearless

Programming.” IEEE Software 24: 24–30.

doi:10.1109/MS.2007.75.

Kilner, S. 2012. “Can Agile Methods Work for Software Maintenance.”

http://www.vlegaci.com/can-

agile-methods-work-for-software-maintenance-part-1/

Larman, C., and V. R. Basili. 2003. “Iterative and Incremental

Development: A Brief History.” IEEE

Computer 36 (6): 47–56. doi:10.1109/MC.2003.1204375.

Leffingwell, D. 2007. Scaling Software Agility: Best Practices for Large

Enterprises. Boston: Addison-Wesley.

Leffingwell, D. 2011. Agile Software Requirements: Lean Requirements

Practices for Teams, Programs and the Enterprise. Boston: Addison-Wesley.

Mulder, M., and M. van Vliet. 2008. “Case Study: Distributed Scrum

Project for Dutch Railways.”

InfoQ. http://www.infoq.com/articles/dutch-railway-scrum

Rubin, K. S. 2013. Essential Scrum. Boston: Addison-Wesley.

Schatz, B., and I. Abdelshafi. 2005. “Primavera Gets Agile: A Successful

Transition to Agile Development.” IEEE Software 22 (3): 36–42.

doi:10.1109/MS.2005.74.

Schwaber, K., and M. Beedle. 2001. Agile Software Development with Scrum.

Englewood Cliffs, NJ: Prentice-Hall.

Stapleton, J. 2003. DSDM: Business Focused Development, 2nd ed. Harlow,

UK: Pearson Education.

Tahchiev, P., F. Leme, V. Massol, and G. Gregory. 2010. JUnit in Action, 2/

e. Greenwich, CT: Manning Publications.

Weinberg, G. 1971. The Psychology of Computer Programming. New York:

Van Nostrand.

Williams, L., R. R. Kessler, W. Cunningham, and R. Jeffries. 2000.

“Strengthening the Case for Pair Programming.” IEEE Software 17 (4): 19–

25. doi:10.1109/52.854064.

4

Requirements

engineering

Objectives

The objective of this chapter is to introduce software requirements and to

explain the processes involved in discovering and documenting these

requirements. When you have read the chapter, you will:

understand the concepts of user and system requirements and why

these requirements should be written in different ways;

understand the differences between functional and non-functional

software requirements;

understand the main requirements engineering activities of elicitation,

analysis, and validation, and the relationships between these

activities;

understand why requirements management is necessary and how it

supports other requirements engineering activities.

Contents

4.1 Functional and non-functional requirements

4.2 Requirements engineering processes

4.3 Requirements elicitation

4.4 Requirements specification

4.5 Requirements validation

4.6 Requirements change

102 Chapter 4 Requirements engineering

The requirements for a system are the descriptions of the services that a

system should provide and the constraints on its operation. These

requirements reflect the needs of customers for a system that serves a

certain purpose such as controlling a device, placing an order, or finding

information. The process of finding out, analyzing, documenting and

checking these services and constraints is called requirements engineering

(RE).

The term requirement is not used consistently in the software industry. In

some cases, a requirement is simply a high-level, abstract statement of a

service that a

system should provide or a constraint on a system. At the other extreme, it

is a

detailed, formal definition of a system function. Davis (Davis 1993)

explains why

these differences exist:

If a company wishes to let a contract for a large software development project,

it must define its needs in a sufficiently abstract way that a solution is not

predefined. The requirements must be written so that several contractors can

bid

for the contract, offering, perhaps, different ways of meeting the client

organization’s needs. Once a contract has been awarded, the contractor must

write a

system definition for the client in more detail so that the client understands

and can validate what the software will do. Both of these documents may be

called the requirements document for the system .

Some of the problems that arise during the requirements engineering

process are

a result of failing to make a clear separation between these different levels

of description. I distinguish between them by using the term user

requirements to mean the high-level abstract requirements and system

requirements to mean the detailed description of what the system should

do. User requirements and system requirements may be defined as follows:

1. User requirements are statements, in a natural language plus diagrams,

of what services the system is expected to provide to system users and the

constraints under

which it must operate. The user requirements may vary from broad

statements of the

system features required to detailed, precise descriptions of the system

functionality.

2. System requirements are more detailed descriptions of the software

system’s

functions, services, and operational constraints. The system requirements

docu-

ment (sometimes called a functional specification) should define exactly

what is

to be implemented. It may be part of the contract between the system

buyer and

the software developers.

Different kinds of requirement are needed to communicate information

about a

system to different types of reader. Figure 4.1 illustrates the distinction

between user and system requirements. This example from the mental

health care patient information system (Mentcare) shows how a user

requirement may be expanded into several

system requirements. You can see from Figure 4.1 that the user

requirement is quite

†Davis, A. M. 1993. Software Requirements: Objects, Functions and States.

Englewood Cliffs, NJ: Prentice-Hall.

Chapter 4 Requirements engineering 103

User requirements definition

1.

The Mentcare system shall generate monthly management reports

showing the cost of drugs prescribed by each clinic during that month.

System requirements specification

1.1 On the last working day of each month, a summary of the drugs

prescribed, their cost and the prescribing clinics shall be generated.

1.2 The system shall generate the report for printing after 17.30 on the

last working day of the month.

1.3 A report shall be created for each clinic and shall list the individual

drug names, the total number of prescriptions, the number of doses

prescribed and the total cost of the prescribed drugs.

1.4 If drugs are available in different dose units (e.g. 10mg, 20mg, etc.)

separate reports shall be created for each dose unit.

1.5 Access to drug cost reports shall be restricted to authorized users as

listed on a management access control list.

Figure 4.1 User and

system requirements

general. The system requirements provide more specific information about

the ser-

vices and functions of the system that is to be implemented.

You need to write requirements at different levels of detail because

different

types of readers use them in different ways. Figure 4.2 shows the types of

readers of the user and system requirements. The readers of the user

requirements are not usually concerned with how the system will be

implemented and may be managers who

are not interested in the detailed facilities of the system. The readers of

the system requirements need to know more precisely what the system

will do because they are

concerned with how it will support the business processes or because they

are

involved in the system implementation.

The different types of document readers shown in Figure 4.2 are examples

of

system stakeholders. As well as users, many other people have some kind

of interest

in the system. System stakeholders include anyone who is affected by the

system in

some way and so anyone who has a legitimate interest in it. Stakeholders

range from

end-users of a system through managers to external stakeholders such as

regulators,

Client managers

System end-users

User

Client engineers

requirements

Contractor managers

System architects

System end-users

Figure 4.2 Readers of

System

Client engineers

different types of

requirements

System architects

requirements

Software developers

specification

104 Chapter 4 Requirements engineering

Feasibility studies

A feasibility study is a short, focused study that should take place early in

the RE process. It should answer three key questions: (1) Does the system

contribute to the overall objectives of the organization? (2) Can the

system be implemented within schedule and budget using current

technology? and (3) Can the system be integrated with other systems that

are used?

If the answer to any of these questions is no, you should probably not go

ahead with the project.

http://software-engineering-book.com/web/feasibility-study/

who certify the acceptability of the system. For example, system

stakeholders for the Mentcare system include:

1. Patients whose information is recorded in the system and relatives of

these patients.

2. Doctors who are responsible for assessing and treating patients.

3. Nurses who coordinate the consultations with doctors and administer

some

treatments.

4. Medical receptionists who manage patients’ appointments.

5. IT staff who are responsible for installing and maintaining the system.

6. A medical ethics manager who must ensure that the system meets

current ethi-

cal guidelines for patient care.

7. Health care managers who obtain management information from the

system.

8. Medical records staff who are responsible for ensuring that system

information

can be maintained and preserved, and that record keeping procedures

have been

properly implemented.

Requirements engineering is usually presented as the first stage of the

software

engineering process. However, some understanding of the system

requirements may

have to be developed before a decision is made to go ahead with the

procurement or

development of a system. This early-stage RE establishes a high-level view

of what

the system might do and the benefits that it might provide. These may

then be con-

sidered in a feasibility study, which tries to assess whether or not the

system is technically and financially feasible. The results of that study help

management decide

whether or not to go ahead with the procurement or development of the

system.

In this chapter, I present a “traditional” view of requirements rather than

require-

ments in agile processes, which I discussed in Chapter 3. For the majority

of large

systems, it is still the case that there is a clearly identifiable requirements

engineering phase before implementation of the system begins. The

outcome is a requirements

document, which may be part of the system development contract. Of

course, subsequent changes are made to the requirements, and user

requirements may be expanded into

4.1 Functional and non-functional requirements 105

more detailed system requirements. Sometimes an agile approach of

concurrently

eliciting the requirements as the system is developed may be used to add

detail and

to refine the user requirements.

4.1 Functional and non-functional requirements

Software system requirements are often classified as functional or non-

functional

requirements:

1. Functional requirements These are statements of services the system

should provide, how the system should react to particular inputs, and how

the system

should behave in particular situations. In some cases, the functional

require-

ments may also explicitly state what the system should not do.

2. Non-functional requirements These are constraints on the services or

functions offered by the system. They include timing constraints,

constraints on the development process, and constraints imposed by

standards. Non-functional require-

ments often apply to the system as a whole rather than individual system

features

or services.

In reality, the distinction between different types of requirements is not as

clear-

cut as these simple definitions suggest. A user requirement concerned with

security, such as a statement limiting access to authorized users, may

appear to be a nonfunctional requirement. However, when developed in

more detail, this requirement

may generate other requirements that are clearly functional, such as the

need to

include user authentication facilities in the system.

This shows that requirements are not independent and that one

requirement often

generates or constrains other requirements. The system requirements

therefore do not just specify the services or the features of the system that

are required; they also specify the necessary functionality to ensure that

these services/features are delivered effectively.

4.1.1 Functional requirements

The functional requirements for a system describe what the system should

do. These

requirements depend on the type of software being developed, the

expected users of the software, and the general approach taken by the

organization when writing requirements.

When expressed as user requirements, functional requirements should be

written in natural language so that system users and managers can

understand them. Functional system requirements expand the user

requirements and are written for system developers. They should describe

the system functions, their inputs and outputs, and exceptions in detail.

Functional system requirements vary from general requirements covering

what

the system should do to very specific requirements reflecting local ways of

working

or an organization’s existing systems. For example, here are examples of

functional

106 Chapter 4 Requirements engineering

Domain requirements

Domain requirements are derived from the application domain of the

system rather than from the specific needs of system users. They may be

new functional requirements in their own right, constrain existing

functional requirements, or set out how particular computations must be

carried out.

The problem with domain requirements is that software engineers may not

understand the characteristics of the domain in which the system operates.

This means that these engineers may not know whether or not a domain

requirement has been missed out or conflicts with other requirements.

http://software-engineering-book.com/web/domain-requirements/

requirements for the Mentcare system, used to maintain information about

patients

receiving treatment for mental health problems:

1. A user shall be able to search the appointments lists for all clinics.

2. The system shall generate each day, for each clinic, a list of patients

who are

expected to attend appointments that day.

3. Each staff member using the system shall be uniquely identified by his

or her

eight-digit employee number.

These user requirements define specific functionality that should be

included in

the system. The requirements show that functional requirements may be

written at

different levels of detail (contrast requirements 1 and 3).

Functional requirements, as the name suggests, have traditionally focused

on

what the system should do. However, if an organization decides that an

existing off-

the-shelf system software product can meet its needs, then there is very

little point in developing a detailed functional specification. In such cases,

the focus should be on the development of information requirements that

specify the information needed

for people to do their work. Information requirements specify the

information needed and how it is to be delivered and organized.

Therefore, an information requirement

for the Mentcare system might specify what information is to be included

in the list of patients expected for appointments that day.

Imprecision in the requirements specification can lead to disputes between

custom-

ers and software developers. It is natural for a system developer to

interpret an ambiguous requirement in a way that simplifies its

implementation. Often, however, this is not what the customer wants.

New requirements have to be established and changes

made to the system. Of course, this delays system delivery and increases

costs.

For example, the first Mentcare system requirement in the above list states

that a

user shall be able to search the appointments lists for all clinics. The

rationale for this requirement is that patients with mental health problems

are sometimes confused.

They may have an appointment at one clinic but actually go to a different

clinic . If they have an appointment, they will be recorded as having

attended, regardless of the clinic.

4.1 Functional and non-functional requirements 107

A medical staff member specifying a search requirement may expect

“search” to

mean that, given a patient name, the system looks for that name in all

appointments at all clinics. However, this is not explicit in the

requirement. System developers may interpret the requirement so that it is

easier to implement. Their search function may require the user to choose

a clinic and then carry out the search of the patients who attended that

clinic. This involves more user input and so takes longer to complete the

search.

Ideally, the functional requirements specification of a system should be

both

complete and consistent. Completeness means that all services and

information

required by the user should be defined. Consistency means that

requirements should

not be contradictory.

In practice, it is only possible to achieve requirements consistency and

complete-

ness for very small software systems. One reason is that it is easy to make

mistakes and omissions when writing specifications for large, complex

systems. Another reason is that large systems have many stakeholders,

with different backgrounds and

expectations. Stakeholders are likely to have different—and often

inconsistent—

needs. These inconsistencies may not be obvious when the requirements

are origi-

nally specified, and the inconsistent requirements may only be discovered

after

deeper analysis or during system development.

4.1.2 Non-functional requirements

Non-functional requirements, as the name suggests, are requirements that

are not

directly concerned with the specific services delivered by the system to its

users.

These non-functional requirements usually specify or constrain

characteristics of the system as a whole. They may relate to emergent

system properties such as reliability, response time, and memory use.

Alternatively, they may define constraints on the

system implementation, such as the capabilities of I/O devices or the data

represen-

tations used in interfaces with other systems.

Non-functional requirements are often more critical than individual

functional

requirements. System users can usually find ways to work around a system

function

that doesn’t really meet their needs. However, failing to meet a non-

functional

requirement can mean that the whole system is unusable. For example, if

an aircraft

system does not meet its reliability requirements, it will not be certified as

safe for operation; if an embedded control system fails to meet its

performance requirements, the control functions will not operate

correctly.

While it is often possible to identify which system components implement

spe-

cific functional requirements (e.g., there may be formatting components

that imple-

ment reporting requirements), this is often more difficult with non-

functional

requirements. The implementation of these requirements may be spread

throughout

the system, for two reasons:

1. Non-functional requirements may affect the overall architecture of a

system

rather than the individual components. For example, to ensure that

performance

requirements are met in an embedded system, you may have to organize

the

system to minimize communications between components.

108 Chapter 4 Requirements engineering

Non-functional

requirements

Product

Organizational

External

requirements

requirements

requirements

Efficiency

Dependability

Security

Regulatory

Ethical

requirements

requirements

requirements

requirements

requirements

Usability

Environmental

Operational

Development

Legislative

requirements

requirements

requirements

requirements

requirements

Performance

Space

Accounting

Safety/security

requirements

requirements

requirements

requirements

Figure 4.3 Types of

non-functional

2. An individual non-functional requirement, such as a security

requirement, may

requirements

generate several, related functional requirements that define new system

ser-

vices that are required if the non-functional requirement is to be

implemented.

In addition, it may also generate requirements that constrain existing

require-

ments; for example, it may limit access to information in the system.

Nonfunctional requirements arise through user needs because of budget

con-

straints, organizational policies, the need for interoperability with other

software or hardware systems, or external factors such as safety

regulations or privacy legislation. Figure 4.3 is a classification of non-

functional requirements. You can see from this diagram that the non-

functional requirements may come from required characteristics of the

software (product requirements), the organization developing the

software (organizational requirements), or external sources:

1. Product requirements These requirements specify or constrain the

runtime behavior of the software. Examples include performance

requirements for how

fast the system must execute and how much memory it requires; reliability

requirements that set out the acceptable failure rate; security

requirements; and

usability requirements.

2. Organizational requirements These requirements are broad system

require-

ments derived from policies and procedures in the customer’s and

developer’s

organizations. Examples include operational process requirements that

define

how the system will be used; development process requirements that

specify the

4.1 Functional and non-functional requirements 109

ProDuct requirement

The Mentcare system shall be available to all clinics during normal

working hours (Mon–Fri, 08:30–17:30).

Downtime within normal working hours shall not exceed 5 seconds in any

one day.

organizational requirement

Users of the Mentcare system shall identify themselves using their health

authority identity card.

external requirement

The system shall implement patient privacy provisions as set out in

HStan-03-2006-priv.

Figure 4.4 Examples of

possible non-functional

programming language; the development environment or process

standards to

requirements for the

be used; and environmental requirements that specify the operating

environ-

Mentcare system

ment of the system.

3. External requirements This broad heading covers all requirements that

are derived from factors external to the system and its development

process. These

may include regulatory requirements that set out what must be done for

the sys-

tem to be approved for use by a regulator, such as a nuclear safety

authority;

legislative requirements that must be followed to ensure that the system

oper-

ates within the law; and ethical requirements that ensure that the system

will be

acceptable to its users and the general public.

Figure 4.4 shows examples of product, organizational, and external

requirements

that could be included in the Mentcare system specification. The product

require-

ment is an availability requirement that defines when the system has to be

available and the allowed downtime each day. It says nothing about the

functionality of the

Mentcare system and clearly identifies a constraint that has to be

considered by

the system designers.

The organizational requirement specifies how users authenticate

themselves to

the system. The health authority that operates the system is moving to a

standard

authentication procedure for all software where, instead of users having a

login

name, they swipe their identity card through a reader to identify

themselves. The

external requirement is derived from the need for the system to conform

to privacy

legislation. Privacy is obviously a very important issue in health care

systems, and the requirement specifies that the system should be

developed in accordance with a

national privacy standard.

A common problem with non-functional requirements is that stakeholders

pro-

pose requirements as general goals, such as ease of use, the ability of the

system to recover from failure, or rapid user response. Goals set out good

intentions but cause problems for system developers as they leave scope

for interpretation and subsequent dispute once the system is delivered. For

example, the following system goal

is typical of how a manager might express usability requirements:

The system should be easy to use by medical staff and should be organized in

such a way that user errors are minimized.

110 Chapter 4 Requirements engineering

Property

measure

Speed

Processed transactions/second

User/event response time

Screen refresh time

Size

Megabytes/Number of ROM chips

Ease of use

Training time

Number of help frames

Reliability

Mean time to failure

Probability of unavailability

Rate of failure occurrence

Availability

Robustness

Time to restart after failure

Percentage of events causing failure

Probability of data corruption on failure

Portability

Percentage of target dependent statements

Number of target systems

Figure 4.5 Metrics for

specifying non-

functional requirements

I have rewritten this to show how the goal could be expressed as a

“testable” non-

functional requirement. It is impossible to objectively verify the system

goal, but in the following description you can at least include software

instrumentation to count the errors made by users when they are testing

the system.

Medical staff shall be able to use all the system functions after two hours of

training. After this training, the average number of errors made by experienced

users shall not exceed two per hour of system use.

Whenever possible, you should write non-functional requirements

quantitatively

so that they can be objectively tested. Figure 4.5 shows metrics that you

can use to specify non-functional system properties. You can measure

these characteristics

when the system is being tested to check whether or not the system has

met its non-

functional requirements.

In practice, customers for a system often find it difficult to translate their

goals into measurable requirements. For some goals, such as

maintainability, there are no simple metrics that can be used. In other

cases, even when quantitative specification is possible, customers may not

be able to relate their needs to these specifications. They don’t understand

what some number defining the reliability (for example) means in

terms of their everyday experience with computer systems. Furthermore,

the cost of

objectively verifying measurable, non-functional requirements can be very

high, and

the customers paying for the system may not think these costs are

justified.

Non-functional requirements often conflict and interact with other

functional or

non-functional requirements. For example, the identification requirement

in

Figure 4.4 requires a card reader to be installed with each computer that

connects to the system. However, there may be another requirement that

requests mobile access

to the system from doctors’ or nurses’ tablets or smartphones. These are

not normally

4.2 Requirements engineering processes 111

equipped with card readers so, in these circumstances, some alternative

identifica-

tion method may have to be supported.

It is difficult to separate functional and non-functional requirements in the

requirements document. If the non-functional requirements are stated

separately

from the functional requirements, the relationships between them may be

hard to

understand. However, you should, ideally, highlight requirements that are

clearly

related to emergent system properties, such as performance or reliability.

You can do this by putting them in a separate section of the requirements

document or by distinguishing them, in some way, from other system

requirements.

Non-functional requirements such as reliability, safety, and confidentiality

requirements are particularly important for critical systems. I cover these

dependa-

bility requirements in Part 2, which describes ways of specifying

reliability, safety, and security requirements.

4.2 Requirements engineering processes

As I discussed in Chapter 2, requirements engineering involves three key

activities.

These are discovering requirements by interacting with stakeholders

(elicitation and analysis); converting these requirements into a standard

form (specification); and

checking that the requirements actually define the system that the

customer wants

(validation). I have shown these as sequential processes in Figure 2.4.

However,

in practice, requirements engineering is an iterative process in which the

activities are interleaved.

Figure 4.6 shows this interleaving. The activities are organized as an

iterative

process around a spiral. The output of the RE process is a system

requirements docu-

ment. The amount of time and effort devoted to each activity in an

iteration depends on the stage of the overall process, the type of system

being developed, and the

budget that is available.

Early in the process, most effort will be spent on understanding high-level

business and non-functional requirements, and the user requirements for

the system. Later in the process, in the outer rings of the spiral, more

effort will be devoted to eliciting and understanding the non-functional

requirements and more detailed system requirements.

This spiral model accommodates approaches to development where the

require-

ments are developed to different levels of detail. The number of iterations

around the spiral can vary so that the spiral can be exited after some or all

of the user requirements have been elicited. Agile development can be

used instead of prototyping so

that the requirements and the system implementation are developed

together.

In virtually all systems, requirements change. The people involved develop

a bet-

ter understanding of what they want the software to do; the organization

buying the

system changes; and modifications are made to the system’s hardware,

software, and

organizational environment. Changes have to be managed to understand

the impact

on other requirements and the cost and system implications of making the

change.

I discuss this process of requirements management in Section 4.6.

112 Chapter 4 Requirements engineering

Requirements

specification

System requirements

specification and

modeling

User requirements

specification

Business requirements

specification

Start

Feasibility

System

Requirements

study

Requirements

req.

elicitation

validation

elicitation

User

requirements

elicitation

Prototyping

Reviews

Figure 4.6 A spiral view

of the requirements

System requirements

engineering process

document

4.3 Requirements elicitation

The aims of the requirements elicitation process are to understand the

work that

stakeholders do and how they might use a new system to help support that

work.

During requirements elicitation, software engineers work with

stakeholders to find

out about the application domain, work activities, the services and system

features

that stakeholders want, the required performance of the system, hardware

con-

straints, and so on.

Eliciting and understanding requirements from system stakeholders is a

difficult

process for several reasons:

1. Stakeholders often don’t know what they want from a computer system

except

in the most general terms; they may find it difficult to articulate what they

want

the system to do; they may make unrealistic demands because they don’t

know

what is and isn’t feasible.

4.3 Requirements elicitation 113

1. Requirements

discovery and

understanding

4. Requirements

2. Requirements

documentation

classification and

organization

3. Requirements

Figure 4.7 The

prioritization and

requirements elicitation

negotiation

and analysis process

2. Stakeholders in a system naturally express requirements in their own

terms and

with implicit knowledge of their own work. Requirements engineers,

without

experience in the customer’s domain, may not understand these

requirements.

3. Different stakeholders, with diverse requirements, may express their

require-

ments in different ways. Requirements engineers have to discover all

potential

sources of requirements and discover commonalities and conflict.

4. Political factors may influence the requirements of a system. Managers

may

demand specific system requirements because these will allow them to

increase

their influence in the organization.

5. The economic and business environment in which the analysis takes

place is

dynamic. It inevitably changes during the analysis process. The

importance of

particular requirements may change. New requirements may emerge from

new

stakeholders who were not originally consulted.

A process model of the elicitation and analysis process is shown in Figure

4.7.

Each organization will have its own version or instantiation of this general

model,

depending on local factors such as the expertise of the staff, the type of

system being developed, and the standards used.

The process activities are:

1. Requirements discovery and understanding This is the process of

interacting with stakeholders of the system to discover their requirements.

Domain requirements

from stakeholders and documentation are also discovered during this

activity.

2. Requirements classification and organization This activity takes the

unstructured collection of requirements, groups related requirements and

organizes

them into coherent clusters.

3. Requirements prioritization and negotiation Inevitably, when multiple

stakeholders are involved, requirements will conflict. This activity is

concerned with

prioritizing requirements and finding and resolving requirements conflicts

114 Chapter 4 Requirements engineering

Viewpoints

A viewpoint is a way of collecting and organizing a set of requirements

from a group of stakeholders who have something in common. Each

viewpoint therefore includes a set of system requirements. Viewpoints

might come from end-users, managers, or others. They help identify the

people who can provide information about their requirements and

structure the requirements for analysis.

http://www.software-engineering-book.com/web/viewpoints/

through negotiation. Usually, stakeholders have to meet to resolve

differences

and agree on compromise requirements.

4. Requirements documentation The requirements are documented and input

into the next round of the spiral. An early draft of the software

requirements documents may be produced at this stage, or the

requirements may simply be main-

tained informally on whiteboards, wikis, or other shared spaces.

Figure 4.7 shows that requirements elicitation and analysis is an iterative

process

with continual feedback from each activity to other activities. The process

cycle

starts with requirements discovery and ends with the requirements

documentation.

The analyst’s understanding of the requirements improves with each

round of the

cycle. The cycle ends when the requirements document has been

produced.

To simplify the analysis of requirements, it is helpful to organize and

group the

stakeholder information. One way of doing so is to consider each

stakeholder group

to be a viewpoint and to collect all requirements from that group into the

viewpoint.

You may also include viewpoints to represent domain requirements and

constraints

from other systems. Alternatively, you can use a model of the system

architecture to identify subsystems and to associate requirements with

each subsystem.

Inevitably, different stakeholders have different views on the importance

and pri-

ority of requirements, and sometimes these views are conflicting. If some

stakehold-

ers feel that their views have not been properly considered, then they may

deliberately attempt to undermine the RE process. Therefore, it is

important that you organize

regular stakeholder meetings. Stakeholders should have the opportunity to

express

their concerns and agree on requirements compromises.

At the requirements documentation stage, it is important that you use

simple lan-

guage and diagrams to describe the requirements. This makes it possible

for stake-

holders to understand and comment on these requirements. To make

information

sharing easier, it is best to use a shared document (e.g., on Google Docs or

Office 365) or a wiki that is accessible to all interested stakeholders.

4.3.1 Requirements elicitation techniques

Requirements elicitation involves meeting with stakeholders of different

kinds to

discover information about the proposed system. You may supplement this

information

4.3 Requirements elicitation 115

with knowledge of existing systems and their usage and information from

docu-

ments of various kinds. You need to spend time understanding how people

work,

what they produce, how they use other systems, and how they may need

to change to

accommodate a new system.

There are two fundamental approaches to requirements elicitation:

1. Interviewing, where you talk to people about what they do.

2. Observation or ethnography, where you watch people doing their job to

see

what artifacts they use, how they use them, and so on.

You should use a mix of interviewing and observation to collect

information and,

from that, you derive the requirements, which are then the basis for

further discussions.

4.3.1.1 Interviewing

Formal or informal interviews with system stakeholders are part of most

require-

ments engineering processes. In these interviews, the requirements

engineering team

puts questions to stakeholders about the system that they currently use

and the sys-

tem to be developed. Requirements are derived from the answers to these

questions.

Interviews may be of two types:

1. Closed interviews, where the stakeholder answers a predefined set of

questions.

2. Open interviews, in which there is no predefined agenda. The

requirements

engineering team explores a range of issues with system stakeholders and

hence

develops a better understanding of their needs.

In practice, interviews with stakeholders are normally a mixture of both of

these.

You may have to obtain the answer to certain questions, but these usually

lead to

other issues that are discussed in a less structured way. Completely open-

ended dis-

cussions rarely work well. You usually have to ask some questions to get

started and to keep the interview focused on the system to be developed.

Interviews are good for getting an overall understanding of what

stakeholders do,

how they might interact with the new system, and the difficulties that

they face with current systems. People like talking about their work, and

so they are usually happy to get involved in interviews. However, unless

you have a system prototype to demonstrate, you should not expect

stakeholders to suggest specific and detailed requirements. Everyone finds

it difficult to visualize what a system might be like. You need to analyze

the information collected and to generate the requirements from this.

Eliciting domain knowledge through interviews can be difficult, for two

reasons:

1. All application specialists use jargon specific to their area of work. It is

impossible for them to discuss domain requirements without using this

terminology.

They normally use words in a precise and subtle way that requirements

engi-

neers may misunderstand.

116 Chapter 4 Requirements engineering

2. Some domain knowledge is so familiar to stakeholders that they either

find it

difficult to explain or they think it is so fundamental that it isn’t worth

mention-

ing. For example, for a librarian, it goes without saying that all

acquisitions are

catalogued before they are added to the library. However, this may not be

obvi-

ous to the interviewer, and so it isn’t taken into account in the

requirements.

Interviews are not an effective technique for eliciting knowledge about

organiza-

tional requirements and constraints because there are subtle power

relationships

between the different people in the organization. Published organizational

structures rarely match the reality of decision making in an organization,

but interviewees may not wish to reveal the actual rather than the

theoretical structure to a stranger. In general, most people are generally

reluctant to discuss political and organizational issues that may affect the

requirements.

To be an effective interviewer, you should bear two things in mind:

1. You should be open-minded, avoid preconceived ideas about the

requirements,

and willing to listen to stakeholders. If the stakeholder comes up with

surprising

requirements, then you should be willing to change your mind about the

system.

2. You should prompt the interviewee to get discussions going by using a

spring-

board question or a requirements proposal, or by working together on a

proto-

type system. Saying to people “tell me what you want” is unlikely to result

in

useful information. They find it much easier to talk in a defined context

rather

than in general terms.

Information from interviews is used along with other information about

the sys-

tem from documentation describing business processes or existing systems,

user

observations, and developer experience. Sometimes, apart from the

information in

the system documents, the interview information may be the only source

of informa-

tion about the system requirements. However, interviewing on its own is

liable to

miss essential information, and so it should be used in conjunction with

other

requirements elicitation techniques.

4.3.1.2 Ethnography

Software systems do not exist in isolation. They are used in a social and

organiza-

tional environment, and software system requirements may be generated

or con-

strained by that environment. One reason why many software systems are

delivered

but never used is that their requirements do not take proper account of

how social

and organizational factors affect the practical operation of the system. It is

therefore very important that, during the requirements engineering

process, you try to understand the social and organizational issues that

affect the use of the system.

Ethnography is an observational technique that can be used to understand

opera-

tional processes and help derive requirements for software to support

these pro-

cesses. An analyst immerses himself or herself in the working environment

where

4.3 Requirements elicitation 117

the system will be used. The day-to-day work is observed, and notes are

made of the

actual tasks in which participants are involved. The value of ethnography

is that it helps discover implicit system requirements that reflect the actual

ways that people work, rather than the formal processes defined by the

organization.

People often find it very difficult to articulate details of their work

because it is second nature to them. They understand their own work but

may not understand its

relationship to other work in the organization. Social and organizational

factors that affect the work, but that are not obvious to individuals, may

only become clear when noticed by an unbiased observer. For example, a

workgroup may self-organize so

that members know of each other’s work and can cover for each other if

someone is

absent. This may not be mentioned during an interview as the group

might not see it

as an integral part of their work.

Suchman (Suchman 1983) pioneered the use of ethnography to study

office work.

She found that actual work practices were far richer, more complex, and

more

dynamic than the simple models assumed by office automation systems.

The differ-

ence between the assumed and the actual work was the most important

reason why

these office systems had no significant effect on productivity. Crabtree

(Crabtree

2003) discusses a wide range of studies since then and describes, in

general, the use of ethnography in systems design. In my own research, I

have investigated methods

of integrating ethnography into the software engineering process by

linking it with

requirements engineering methods (Viller and Sommerville 2000) and

documenting

patterns of interaction in cooperative systems (Martin and Sommerville

2004).

Ethnography is particularly effective for discovering two types of

requirements:

1. Requirements derived from the way in which people actually work,

rather than

the way in which business process definitions say they ought to work. In

prac-

tice, people never follow formal processes. For example, air traffic

controllers

may switch off a conflict alert system that detects aircraft with

intersecting

flight paths, even though normal control procedures specify that it should

be

used. The conflict alert system is sensitive and issues audible warnings

even

when planes are far apart. Controllers may find these distracting and

prefer to

use other strategies to ensure that planes are not on conflicting flight

paths.

2. Requirements derived from cooperation and awareness of other people’s

activi-

ties. For example, air traffic controllers (ATCs) may use an awareness of

other

controlles’ work to predict the number of aircraft that will be entering

their con-

trol sector. They then modify their control strategies depending on that

pre-

dicted workload. Therefore, an automated ATC system should allow

controllers

in a sector to have some visibility of the work in adjacent sectors.

Ethnography can be combined with the development of a system

prototype

(Figure 4.8). The ethnography informs the development of the prototype

so that

fewer prototype refinement cycles are required. Furthermore, the

prototyping

focuses the ethnography by identifying problems and questions that can

then be dis-

cussed with the ethnographer. He or she should then look for the answers

to these

questions during the next phase of the system study (Sommerville et al.

1993).

118 Chapter 4 Requirements engineering

Ethnographic

Debriefing

Focused

analysis

meetings

ethnography

Prototype

evaluation

Figure 4.8 Ethnography

Generic system

System

and prototyping for

development

protoyping

requirements analysis

Ethnography is helpful to understand existing systems, but this

understanding

does not always help with innovation. Innovation is particularly relevant

for new

product development. Commentators have suggested that Nokia used

ethnography

to discover how people used their phones and developed new phone

models on that

basis; Apple, on the other hand, ignored current use and revolutionized

the mobile

phone industry with the introduction of the iPhone.

Ethnographic studies can reveal critical process details that are often

missed by

other requirements elicitation techniques. However, because of its focus

on the end-

user, this approach is not effective for discovering broader organizational

or domain requirements or for suggestion innovations. You therefore have

to use ethnography

as one of a number of techniques for requirements elicitation.

4.3.2 Stories and scenarios

People find it easier to relate to real-life examples than abstract

descriptions. They are not good at telling you the system requirements.

However, they may be able to

describe how they handle particular situations or imagine things that they

might do

in a new way of working. Stories and scenarios are ways of capturing this

kind of

information. You can then use these when interviewing groups of

stakeholders to

discuss the system with other stakeholders and to develop more specific

system

requirements.

Stories and scenarios are essentially the same thing. They are a description

of how

the system can be used for some particular task. They describe what

people do, what

information they use and produce, and what systems they may use in this

process.

The difference is in the ways that descriptions are structured and in the

level of detail presented. Stories are written as narrative text and present a

high-level description of system use; scenarios are usually structured with

specific information collected such as inputs and outputs. I find stories to

be effective in setting out the “big picture.”

Parts of stories can then be developed in more detail and represented as

scenarios.

Figure 4.9 is an example of a story that I developed to understand the

requirements

for the iLearn digital learning environment that I introduced in Chapter 1.

This story describes a situation in a primary (elementary) school where

the teacher is using the environment to support student projects on the

fishing industry. You can see this is a very high-level description. Its

purpose is to facilitate discussion of how the iLearn system might be used

and to act as a starting point for eliciting the requirements for that system.

4.3 Requirements elicitation 119

Photo sharing in the classroom

Jack is a primary school teacher in Ullapool (a village in northern

Scotland). He has decided that a class project should be focused on the

fishing industry in the area, looking at the history, development, and

economic impact of fishing. As part of this project, pupils are asked to

gather and share reminiscences from relatives, use newspaper archives,

and collect old photographs related to fishing and fishing communities in

the area. Pupils use an iLearn wiki to gather together fishing stories and

SCRAN (a history resources site) to access newspaper archives and

photographs. However, Jack also needs a photo-sharing site because he

wants pupils to take and comment on each other’s photos and to upload

scans of old photographs that they may have in their families.

Jack sends an email to a primary school teachers’ group, which he is a

member of, to see if anyone can recommend an appropriate system. Two

teachers reply, and both suggest that he use KidsTakePics, a photo-sharing

site that allows teachers to check and moderate content. As KidsTakePics

is not integrated with the iLearn authentication service, he sets up a

teacher and a class account. He uses the iLearn setup service to add

KidsTakePics to the services seen by the pupils in his class so that when

they log in, they can immediately use the system to upload photos from

their mobile devices and class computers.

Figure 4.9 A user story

for the iLearn system

The advantage of stories is that everyone can easily relate to them. We

found this

approach to be particularly useful to get information from a wider

community than

we could realistically interview. We made the stories available on a wiki

and invited teachers and students from across the country to comment on

them.

These high-level stories do not go into detail about a system, but they can

be

developed into more specific scenarios. Scenarios are descriptions of

example user

interaction sessions. I think that it is best to present scenarios in a

structured way rather than as narrative text. User stories used in agile

methods such as Extreme

Programming, are actually narrative scenarios rather than general stories

to help

elicit requirements.

A scenario starts with an outline of the interaction. During the elicitation

process, details are added to create a complete description of that

interaction. At its most

general, a scenario may include:

1. A description of what the system and users expect when the scenario

starts.

2. A description of the normal flow of events in the scenario.

3. A description of what can go wrong and how resulting problems can be

handled.

4. Information about other activities that might be going on at the same

time.

5. A description of the system state when the scenario ends.

As an example of a scenario, Figure 4.10 describes what happens when a

student

uploads photos to the KidsTakePics system, as explained in Figure 4.9. The

key dif-

ference between this system and other systems is that a teacher moderates

the

uploaded photos to check that they are suitable for sharing.

You can see this is a much more detailed description than the story in

Figure 4.9,

and so it can be used to propose requirements for the iLearn system. Like

stories,

scenarios can be used to facilitate discussions with stakeholders who

sometimes may

have different ways of achieving the same result.

120 Chapter 4 Requirements engineering

uploading photos to KidstakePics

initial assumption: A user or a group of users have one or more digital

photographs to be uploaded to the picture-sharing site. These photos are

saved on either a tablet or a laptop computer. They have successfully

logged on to KidsTakePics.

normal: The user chooses to upload photos and is prompted to select the

photos to be uploaded on the computer and to select the project name

under which the photos will be stored. Users should also be given the

option of inputting keywords that should be associated with each

uploaded photo. Uploaded photos are named by creating a conjunction of

the user name with the filename of the photo on the local computer.

On completion of the upload, the system automatically sends an email to

the project moderator, asking them to check new content, and generates

an on-screen message to the user that this checking has been done.

What can go wrong: No moderator is associated with the selected project.

An email is automatically generated to the school administrator asking

them to nominate a project moderator. Users should be informed of a

possible delay in making their photos visible.

Photos with the same name have already been uploaded by the same user.

The user should be asked if he or she wishes to re-upload the photos with

the same name, rename the photos, or cancel the upload. If users choose

to re-upload the photos, the originals are overwritten. If they choose to

rename the photos, a new name is automatically generated by adding a

number to the existing filename.

other activities: The moderator may be logged on to the system and may

approve photos as they are uploaded.

System state on completion: User is logged on. The selected photos have

been uploaded and assigned a status

“awaiting moderation.” Photos are visible to the moderator and to the

user who uploaded them.

Figure 4.10 Scenario

for uploading photos

in KidsTakePics

4.4 Requirements specification

Requirements specification is the process of writing down the user and

system requirements in a requirements document. Ideally, the user and

system requirements should

be clear, unambiguous, easy to understand, complete, and consistent. In

practice, this is almost impossible to achieve. Stakeholders interpret the

requirements in different ways, and there are often inherent conflicts and

inconsistencies in the requirements.

User requirements are almost always written in natural language

supplemented

by appropriate diagrams and tables in the requirements document. System

require-

ments may also be written in natural language, but other notations based

on forms,

graphical, or mathematical system models can also be used. Figure 4.11

summarizes

possible notations for writing system requirements.

The user requirements for a system should describe the functional and

nonfunctional

requirements so that they are understandable by system users who don’t

have detailed technical knowledge. Ideally, they should specify only the

external behavior of the system. The requirements document should not

include details of the system architecture or design. Consequently, if you

are writing user requirements, you should not use software jargon,

structured notations, or formal notations. You should write user

requirements in natural language, with simple tables, forms, and intuitive

diagrams.

4.4 Requirements specification 121

notation

Description

Natural language

The requirements are written using numbered sentences in natural

language.

sentences

Each sentence should express one requirement.

Structured natural

The requirements are written in natural language on a standard form or

language

template. Each field provides information about an aspect of the

requirement.

Graphical notations

Graphical models, supplemented by text annotations, are used to define

the

functional requirements for the system. UML (unified modeling language)

use

case and sequence diagrams are commonly used.

Mathematical

These notations are based on mathematical concepts such as finite-state

specifications

machines or sets. Although these unambiguous specifications can reduce

the

ambiguity in a requirements document, most customers don’t understand

a

formal specification. They cannot check that it represents what they want,

and

they are reluctant to accept it as a system contract. (I discuss this

approach, in

Chapter 10, which covers system dependability.)

Figure 4.11 Notations

System requirements are expanded versions of the user requirements that

soft-

for writing system

requirements

ware engineers use as the starting point for the system design. They add

detail and

explain how the system should provide the user requirements. They may

be used as

part of the contract for the implementation of the system and should

therefore be a

complete and detailed specification of the whole system.

Ideally, the system requirements should only describe the external

behavior of the

system and its operational constraints. They should not be concerned with

how the

system should be designed or implemented. However, at the level of detail

required

to completely specify a complex software system, it is neither possible nor

desirable to exclude all design information. There are several reasons for

this:

1. You may have to design an initial architecture of the system to help

structure the requirements specification. The system requirements are

organized according to

the different subsystems that make up the system. We did this when we

were

defining the requirements for the iLearn system, where we proposed the

archi-

tecture shown in Figure 1.8.

2. In most cases, systems must interoperate with existing systems, which

constrain

the design and impose requirements on the new system.

3. The use of a specific architecture to satisfy non-functional requirements,

such as N-version programming to achieve reliability, discussed in Chapter

11, may be

necessary. An external regulator who needs to certify that the system is

safe may

specify that an architectural design that has already been certified should

be used.

4.4.1 Natural language specification

Natural language has been used to write requirements for software since

the 1950s.

It is expressive, intuitive, and universal. It is also potentially vague and

ambiguous, and its interpretation depends on the background of the

reader. As a result, there

122 Chapter 4 Requirements engineering

3.2 The system shall measure the blood sugar and deliver insulin, if

required, every 10 minutes. ( Changes in blood sugar are relatively slow, so

more frequent measurement is unnecessary; less frequent measurement could

lead to unnecessarily high sugar levels. )

3.6 The system shall run a self-test routine every minute with the

conditions to be tested and the associated actions defined in Table 1. ( A

self-test routine can discover hardware and software problems and alert the

user to the fact the normal operation may be impossible.)

Figure 4.12 Example

have been many proposals for alternative ways to write requirements.

However,

requirements for the

none of these proposals has been widely adopted, and natural language

will continue

insulin pump software

system

to be the most widely used way of specifying system and software

requirements.

To minimize misunderstandings when writing natural language

requirements, I

recommend that you follow these simple guidelines:

1. Invent a standard format and ensure that all requirement definitions

adhere to

that format. Standardizing the format makes omissions less likely and

requirements

easier to check. I suggest that, wherever possible, you should write the

requirement in one or two sentences of natural language.

2. Use language consistently to distinguish between mandatory and

desirable

requirements. Mandatory requirements are requirements that the system

must

support and are usually written using “shall.” Desirable requirements are

not

essential and are written using “should.”

3. Use text highlighting (bold, italic, or color) to pick out key parts of the

requirement.

4. Do not assume that readers understand technical, software engineering

language.

It is easy for words such as “architecture” and “module” to be

misunderstood.

Wherever possible, you should avoid the use of jargon, abbreviations, and

acronyms.

5. Whenever possible, you should try to associate a rationale with each

user

requirement. The rationale should explain why the requirement has been

included and who proposed the requirement (the requirement source), so

that

you know whom to consult if the requirement has to be changed.

Requirements

rationale is particularly useful when requirements are changed, as it may

help

decide what changes would be undesirable.

Figure 4.12 illustrates how these guidelines may be used. It includes two

require-

ments for the embedded software for the automated insulin pump,

introduced in

Chapter 1. Other requirements for this embedded system are defined in

the insulin

pump requirements document, which can be downloaded from the book’s

web pages.

4.4.2 Structured specifications

Structured natural language is a way of writing system requirements

where require-

ments are written in a standard way rather than as free-form text. This

approach

maintains most of the expressiveness and understandability of natural

language but

4.4 Requirements specification 123

Problems with using natural language for requirements specification

The flexibility of natural language, which is so useful for specification,

often causes problems. There is scope for writing unclear requirements,

and readers (the designers) may misinterpret requirements because they

have a different background to the user. It is easy to amalgamate several

requirements into a single sentence, and structuring natural language

requirements can be difficult.

http://software-engineering-book.com/web/natural-language/

ensures that some uniformity is imposed on the specification. Structured

language

notations use templates to specify system requirements. The specification

may use

programming language constructs to show alternatives and iteration, and

may high-

light key elements using shading or different fonts.

The Robertsons (Robertson and Robertson 2013), in their book on the

VOLERE

requirements engineering method, recommend that user requirements be

initially

written on cards, one requirement per card. They suggest a number of

fields on each

card, such as the requirements rationale, the dependencies on other

requirements, the source of the requirements, and supporting materials.

This is similar to the approach used in the example of a structured

specification shown in Figure 4.13.

To use a structured approach to specifying system requirements, you

define one

or more standard templates for requirements and represent these

templates as struc-

tured forms. The specification may be structured around the objects

manipulated by

the system, the functions performed by the system, or the events processed

by the

system. An example of a form-based specification, in this case, one that

defines how to calculate the dose of insulin to be delivered when the

blood sugar is within a safe band, is shown in Figure 4.13.

When a standard format is used for specifying functional requirements, the

fol-

lowing information should be included:

1. A description of the function or entity being specified.

2. A description of its inputs and the origin of these inputs.

3. A description of its outputs and the destination of these outputs.

4. Information about the information needed for the computation or other

entities

in the system that are required (the “requires” part).

5. A description of the action to be taken.

6. If a functional approach is used, a precondition setting out what must

be true

before the function is called, and a postcondition specifying what is true

after

the function is called.

7. A description of the side effects (if any) of the operation.

Using structured specifications removes some of the problems of natural

language

specification. Variability in the specification is reduced, and requirements

are organized

124 Chapter 4 Requirements engineering

Insulin Pump/Control Software/SRS/3.3.2

Function

Compute insulin dose: Safe sugar level.

Description

Computes the dose of insulin to be delivered when the current measured

sugar level is in the safe zone between 3 and 7 units.

inputs

Current sugar reading (r2), the previous two readings (r0 and r1).

Source

Current sugar reading from sensor. Other readings from memory.

outputs

CompDose—the dose in insulin to be delivered.

Destination

Main control loop.

action:

CompDose is zero if the sugar level is stable or falling or if the level is

increasing but the rate of increase is decreasing. If the level is increasing

and the rate of increase is increasing, then CompDose is computed by

dividing the difference between the current sugar level and the previous

level by 4 and rounding the result. If the result, is rounded to zero then

CompDose is set to the minimum dose that can be delivered. (see Figure

4.14)

requires

Two previous readings so that the rate of change of sugar level can be

computed.

Precondition

The insulin reservoir contains at least the maximum allowed single dose of

insulin.

Postcondition

r0 is replaced by r1 then r1 is replaced by r2.

Side effects None.

Figure 4.13 The

structured specification

of a requirement for

more effectively. However, it is still sometimes difficult to write

requirements in a an insulin pump

clear and unambiguous way, particularly when complex computations

(e.g., how to

calculate the insulin dose) are to be specified.

To address this problem, you can add extra information to natural

language

requirements, for example, by using tables or graphical models of the

system. These

can show how computations proceed, how the system state changes, how

users inter-

act with the system, and how sequences of actions are performed.

Tables are particularly useful when there are a number of possible

alternative

situations and you need to describe the actions to be taken for each of

these. The

insulin pump bases its computations of the insulin requirement on the rate

of change of blood sugar levels. The rates of change are computed using

the current and previous readings. Figure 4.14 is a tabular description of

how the rate of change of blood Figure 4.14 The

tabular specification

sugar is used to calculate the amount of insulin to be delivered.

of computation in an

insulin pump

condition

action

Sugar level falling (r2 6 r1)

CompDose = 0

Sugar level stable (r2 = r1)

CompDose = 0

Sugar level increasing and rate of increase

CompDose = 0

decreasing ((r2 - r1)<(r1 - r0))

Sugar level increasing and rate of increase stable

CompDose = round ((r2 - r1)/4)

or increasing r2 7 r1 & ((r2 - r1) Ú (r1 − r0))

If rounded result = 0 then

CompDose = MinimumDose

4.4 Requirements specification 125

Register

Export

patient

statistics

View

Manager

Generate

Medical receptionist

personal info.

report

View record

Nurse

Doctor

Edit record

Setup

Figure 4.15 Use cases

consultation

for the Mentcare system

4.4.3 Use cases

Use cases are a way of describing interactions between users and a system

using a

graphical model and structured text. They were first introduced in the

Objectory

method (Jacobsen et al. 1993) and have now become a fundamental

feature of the

Unified Modeling Language (UML). In their simplest form, a use case

identifies the

actors involved in an interaction and names the type of interaction. You

then add

additional information describing the interaction with the system. The

additional

information may be a textual description or one or more graphical models

such as

the UML sequence or state charts (see Chapter 5).

Use cases are documented using a high-level use case diagram. The set of

use

cases represents all of the possible interactions that will be described in

the system requirements. Actors in the process, who may be human or

other systems, are represented as stick figures. Each class of interaction is

represented as a named ellipse.

Lines link the actors with the interaction. Optionally, arrowheads may be

added to

lines to show how the interaction is initiated. This is illustrated in Figure

4.15, which shows some of the use cases for the Mentcare system.

Use cases identify the individual interactions between the system and its

users or

other systems. Each use case should be documented with a textual

description. These

can then be linked to other models in the UML that will develop the

scenario in more detail. For example, a brief description of the Setup

Consultation use case from

Figure 4.15 might be:

Setup consultation allows two or more doctors, working in different offices, to

view the same patient record at the same time. One doctor initiates the consul-

tation by choosing the people involved from a dropdown menu of doctors who

are online. The patient record is then displayed on their screens, but only the

initiating doctor can edit the record. In addition, a text chat window is created

126 Chapter 4 Requirements engineering

to help coordinate actions. It is assumed that a phone call for voice communi-

cation can be separately arranged.

The UML is a standard for object-oriented modeling, so use cases and use

case-

based elicitation are used in the requirements engineering process.

However, my

experience with use cases is that they are too fine-grained to be useful in

discussing requirements. Stakeholders don’t understand the term use case;

they don’t find the graphical model to be useful, and they are often not

interested in a detailed description of each and every system interaction.

Consequently, I find use cases to be more helpful in systems design than in

requirements engineering. I discuss use cases further in Chapter 5, which

shows how they are used alongside other system models to

document a system design.

Some people think that each use case is a single, low-level interaction

scenario.

Others, such as Stevens and Pooley (Stevens and Pooley 2006), suggest

that each use

case includes a set of related, low-level scenarios. Each of these scenarios

is a single thread through the use case. Therefore, there would be a

scenario for the normal

interaction plus scenarios for each possible exception. In practice, you can

use them in either way.

4.4.4 The software requirements document

The software requirements document (sometimes called the software

requirements

specification or SRS) is an official statement of what the system developers

should

implement. It may include both the user requirements for a system and a

detailed

specification of the system requirements. Sometimes the user and system

require-

ments are integrated into a single description. In other cases, the user

requirements are described in an introductory chapter in the system

requirements specification.

Requirements documents are essential when systems are outsourced for

development,

when different teams develop different parts of the system, and when a

detailed analysis of the requirements is mandatory. In other

circumstances, such as software product or business system development,

a detailed requirements document may not be needed.

Agile methods argue that requirements change so rapidly that a

requirements

document is out of date as soon as it is written, so the effort is largely

wasted. Rather than a formal document, agile approaches often collect

user requirements incrementally and write these on cards or whiteboards

as short user stories. The user then

prioritizes these stories for implementation in the next increment of the

system.

For business systems where requirements are unstable, I think that this

approach

is a good one. However, I think that it is still useful to write a short

supporting document that defines the business and dependability

requirements for the system; it is

easy to forget the requirements that apply to the system as a whole when

focusing on the functional requirements for the next system release.

The requirements document has a diverse set of users, ranging from the

senior

management of the organization that is paying for the system to the

engineers

responsible for developing the software. Figure 4.16 shows possible users

of the

document and how they use it.

4.4 Requirements specification 127

Specify the requirements and

read them to check that they

System

meet their needs. Customers

customers

specify changes to the

requirements.

Use the requirements

document to plan a bid for

Managers

the system and to plan the

system development process.

Use the requirements to

System

understand what system is

engineers

to be developed.

System test

Use the requirements to

engineers

develop validation tests for

the system.

Use the requirements to

System

understand the system and

maintenance

the relationships between its

engineers

Figure 4.16 Users of a

parts.

requirements document

The diversity of possible users means that the requirements document has

to be a

compromise. It has to describe the requirements for customers, define the

require-

ments in precise detail for developers and testers, as well as include

information

about future system evolution. Information on anticipated changes helps

system

designers to avoid restrictive design decisions and maintenance engineers

to adapt

the system to new requirements.

The level of detail that you should include in a requirements document

depends

on the type of system that is being developed and the development process

used.

Critical systems need detailed requirements because safety and security

have to be

analyzed in detail to find possible requirements errors. When the system is

to be

developed by a separate company (e.g., through outsourcing), the system

specifica-

tions need to be detailed and precise. If an in-house, iterative development

process is used, the requirements document can be less detailed. Details

can be added to the

requirements and ambiguities resolved during development of the system.

Figure 4.17 shows one possible organization for a requirements document

that is

based on an IEEE standard for requirements documents (IEEE 1998). This

standard

is a generic one that can be adapted to specific uses. In this case, the

standard has been extended to include information about predicted system

evolution. This information helps the maintainers of the system and allows

designers to include support

for future system features.

128 Chapter 4 Requirements engineering

chapter

Description

Preface

This defines the expected readership of the document and describe its

version history, including a rationale for the creation of a new version and

a summary of the changes made in each version.

Introduction

This describes the need for the system. It should briefly describe the

system’s functions and explain how it will work with other systems. It

should also describe how the system fits into the overall business or

strategic objectives of the organization commissioning the software.

Glossary

This defines the technical terms used in the document. You should not

make assumptions about the experience or expertise of the reader.

User

Here, you describe the services provided for the user. The nonfunctional

system

requirements

requirements should also be described in this section. This description

may use natural definition

language, diagrams, or other notations that are understandable to

customers. Product and process standards that must be followed should be

specified.

System

This chapter presents a high-level overview of the anticipated system

architecture, showing architecture

the distribution of functions across system modules. Architectural

components that are reused should be highlighted.

System

This describes the functional and nonfunctional requirements in more

detail. If necessary, requirements

further detail may also be added to the nonfunctional requirements.

Interfaces to other specification

systems may be defined.

System

This chapter includes graphical system models showing the relationships

between the

models

system components and the system and its environment. Examples of

possible models are object models, data-flow models, or semantic data

models.

System

This describes the fundamental assumptions on which the system is based,

and any

evolution

anticipated changes due to hardware evolution, changing user needs, and

so on. This

section is useful for system designers as it may help them avoid design

decisions that would constrain likely future changes to the system.

Appendices

These provide detailed, specific information that is related to the

application being developed—for example, hardware and database

descriptions. Hardware requirements

define the minimal and optimal configurations for the system. Database

requirements define the logical organization of the data used by the

system and the relationships between data.

Index

Several indexes to the document may be included. As well as a normal

alphabetic index, there may be an index of diagrams, an index of

functions, and so on.

Figure 4.17 The

structure of a

requirements

Naturally, the information included in a requirements document depends

on the

document

type of software being developed and the approach to development that is

to be used.

A requirements document with a structure like that shown in Figure 4.17

might be

produced for a complex engineering system that includes hardware and

software

developed by different companies. The requirements document is likely to

be long

and detailed. It is therefore important that a comprehensive table of

contents and document index be included so that readers can easily find

the information they need.

By contrast, the requirements document for an in-house software product

will

leave out many of detailed chapters suggested above. The focus will be on

defining

the user requirements and high-level, nonfunctional system requirements.

The sys-

tem designers and programmers use their judgment to decide how to meet

the out-

line user requirements for the system.

4.5 Requirements validation 129

requirements document standards

A number of large organizations, such as the U.S. Department of Defense

and the IEEE, have defined standards for requirements documents. These

are usually very generic but are nevertheless useful as a basis for

developing more detailed organizational standards. The U.S. Institute of

Electrical and Electronic Engineers (IEEE) is one of the best-known

standards providers, and they have developed a standard for the structure

of requirements documents. This standard is most appropriate for systems

such as military command and control systems that have a long lifetime

and are usually developed by a group of organizations.

http://software-engineering-book.com/web/requirements-standard/

4.5 Requirements validation

Requirements validation is the process of checking that requirements

define the sys-

tem that the customer really wants. It overlaps with elicitation and

analysis, as it is concerned with finding problems with the requirements.

Requirements validation is

critically important because errors in a requirements document can lead to

extensive rework costs when these problems are discovered during

development or after the

system is in service.

The cost of fixing a requirements problem by making a system change is

usually

much greater than repairing design or coding errors. A change to the

requirements

usually means that the system design and implementation must also be

changed.

Furthermore, the system must then be retested.

During the requirements validation process, different types of checks

should be

carried out on the requirements in the requirements document. These

checks include:

1. Validity checks These check that the requirements reflect the real needs

of system users. Because of changing circumstances, the user requirements

may have

changed since they were originally elicited.

2. Consistency checks Requirements in the document should not conflict.

That is, there should not be contradictory constraints or different

descriptions of the

same system function.

3. Completeness checks The requirements document should include

requirements that define all functions and the constraints intended by the

system user.

4. Realism checks By using knowledge of existing technologies, the

requirements should be checked to ensure that they can be implemented

within the proposed

budget for the system. These checks should also take account of the

budget and

schedule for the system development.

5. Verifiability To reduce the potential for dispute between customer and

contractor, system requirements should always be written so that they are

verifiable.

This means that you should be able to write a set of tests that can

demonstrate

that the delivered system meets each specified requirement.

130 Chapter 4 Requirements engineering

requirements reviews

A requirements review is a process in which a group of people from the

system customer and the system developer read the requirements

document in detail and check for errors, anomalies, and inconsistencies.

Once these have been detected and recorded, it is then up to the customer

and the developer to negotiate how the identified problems should be

solved.

http://software-engineering-book.com/web/requirements-reviews/

A number of requirements validation techniques can be used individually

or in

conjunction with one another:

1. Requirements reviews The requirements are analyzed systematically by a

team of reviewers who check for errors and inconsistencies.

2. Prototyping This involves developing an executable model of a system

and using this with end-users and customers to see if it meets their needs

and expectations. Stakeholders experiment with the system and feed back

requirements

changes to the development team.

3. Test-case generation Requirements should be testable. If the tests for the

requirements are devised as part of the validation process, this often

reveals

requirements problems. If a test is difficult or impossible to design, this

usually

means that the requirements will be difficult to implement and should be

recon-

sidered. Developing tests from the user requirements before any code is

written

is an integral part of test-driven development.

You should not underestimate the problems involved in requirements

validation.

Ultimately, it is difficult to show that a set of requirements does in fact

meet a user’s needs. Users need to picture the system in operation and

imagine how that system

would fit into their work. It is hard even for skilled computer professionals

to perform this type of abstract analysis and harder still for system users.

As a result, you rarely find all requirements problems during the

requirements

validation process. Inevitably, further requirements changes will be

needed to cor-

rect omissions and misunderstandings after agreement has been reached

on the

requirements document.

4.6 Requirements change

The requirements for large software systems are always changing. One

reason for

the frequent changes is that these systems are often developed to address

“wicked”

problems—problems that cannot be completely defined (Rittel and

Webber 1973).

Because the problem cannot be fully defined, the software requirements

are bound to

4.6 Requirements change 131

Initial

Changed

understanding

understanding

of problem

of problem

Initial

Changed

requirements

requirements

Figure 4.18

Requirements evolution

Time

be incomplete. During the software development process, the stakeholders’

under-

standing of the problem is constantly changing (Figure 4.18). The system

require-

ments must then evolve to reflect this changed problem understanding.

Once a system has been installed and is regularly used, new requirements

inevita-

bly emerge. This is partly a consequence of errors and omissions in the

original

requirements that have to be corrected. However, most changes to system

require-

ments arise because of changes to the business environment of the system:

1. The business and technical environment of the system always changes

after

installation. New hardware may be introduced and existing hardware

updated. It

may be necessary to interface the system with other systems. Business

priorities

may change (with consequent changes in the system support required),

and new

legislation and regulations may be introduced that require system

compliance.

2. The people who pay for a system and the users of that system are rarely

the

same people. System customers impose requirements because of

organizational

and budgetary constraints. These may conflict with end-user requirements,

and,

after delivery, new features may have to be added for user support if the

system

is to meet its goals.

3. Large systems usually have a diverse stakeholder community, with

stakeholders

having different requirements. Their priorities may be conflicting or

contradic-

tory. The final system requirements are inevitably a compromise, and

some

stakeholders have to be given priority. With experience, it is often

discovered

that the balance of support given to different stakeholders has to be

changed and

the requirements re-prioritized.

As requirements are evolving, you need to keep track of individual

requirements

and maintain links between dependent requirements so that you can assess

the

impact of requirements changes. You therefore need a formal process for

making

change proposals and linking these to system requirements. This process

of “require-

ments management” should start as soon as a draft version of the

requirements docu-

ment is available.

Agile development processes have been designed to cope with

requirements that

change during the development process. In these processes, when a user

proposes a

requirements change, this change does not go through a formal change

management

132 Chapter 4 Requirements engineering

enduring and volatile requirements

Some requirements are more susceptible to change than others. Enduring

requirements are the requirements that are associated with the core, slow-

to-change activities of an organization. Enduring requirements are

associated with fundamental work activities. Volatile requirements are

more likely to change. They are usually associated with supporting

activities that reflect how the organization does its work rather than the

work itself.

http://software-engineering-book.com/web/changing-requirements/

process. Rather, the user has to prioritize that change and, if it is high

priority, decide what system features that were planned for the next

iteration should be dropped for

the change to be implemented.

The problem with this approach is that users are not necessarily the best

people to

decide on whether or not a requirements change is cost-effective. In

systems with

multiple stakeholders, changes will benefit some stakeholders and not

others. It is

often better for an independent authority, who can balance the needs of

all stake-

holders, to decide on the changes that should be accepted.

4.6.1 Requirements management planning

Requirements management planning is concerned with establishing how a

set of

evolving requirements will be managed. During the planning stage, you

have to

decide on a number of issues:

1. Requirements identification Each requirement must be uniquely identified

so that it can be cross-referenced with other requirements and used in

traceability

assessments.

2. A change management process This is the set of activities that assess the

impact and cost of changes. I discuss this process in more detail in the

following section.

3. Traceability policies These policies define the relationships between each

requirement and between the requirements and the system design that

should be recorded.

The traceability policy should also define how these records should be

maintained.

4. Tool support Requirements management involves the processing of large

amounts of information about the requirements. Tools that may be used

range

from specialist requirements management systems to shared spreadsheets

and

simple database systems.

Requirements management needs automated support, and the software

tools for

this should be chosen during the planning phase. You need tool support

for:

1. Requirements storage The requirements should be maintained in a

secure, managed data store that is accessible to everyone involved in the

requirements engi-

neering process.

4.6 Requirements change 133

Identified

Revised

problem

Problem analysis and

Change analysis

Change

requirements

change specification

and costing

implementation

Figure 4.19

Requirements change

management

2. Change management The process of change management (Figure 4.19) is

sim-

plified if active tool support is available. Tools can keep track of suggested

changes and responses to these suggestions.

3. Traceability management As discussed above, tool support for traceability

allows related requirements to be discovered. Some tools are available

which

use natural language processing techniques to help discover possible

relation-

ships between requirements.

For small systems, you do not need to use specialized requirements

management

tools. Requirements management can be supported using shared web

documents,

spreadsheets, and databases. However, for larger systems, more

specialized tool sup-

port, using systems such as DOORS (IBM 2013), makes it much easier to

keep track

of a large number of changing requirements.

4.6.2 Requirements change management

Requirements change management (Figure 4.19) should be applied to all

proposed

changes to a system’s requirements after the requirements document has

been approved.

Change management is essential because you need to decide if the benefits

of imple-

menting new requirements are justified by the costs of implementation.

The advantage of using a formal process for change management is that

all change proposals are treated consistently and changes to the

requirements document are made in a controlled way.

There are three principal stages to a change management process:

1. Problem analysis and change specification The process starts with an

identified requirements problem or, sometimes, with a specific change

proposal.

During this stage, the problem or the change proposal is analyzed to check

that

it is valid. This analysis is fed back to the change requestor who may

respond

with a more specific requirements change proposal, or decide to withdraw

the request.

2. Change analysis and costing The effect of the proposed change is assessed

using traceability information and general knowledge of the system

requirements. The cost of making the change is estimated in terms of

modifications to

the requirements document and, if appropriate, to the system design and

imple-

mentation. Once this analysis is completed, a decision is made as to

whether or

not to proceed with the requirements change.

134 Chapter 4 Requirements engineering

requirements traceability

You need to keep track of the relationships between requirements, their

sources, and the system design so that you can analyze the reasons for

proposed changes and the impact that these changes are likely to have on

other parts of the system. You need to be able to trace how a change

ripples its way through the system. Why?

http://software-engineering-book.com/web/traceability/

3. Change implementation The requirements document and, where

necessary, the system design and implementation, are modified. You

should organize the

requirements document so that you can make changes to it without

extensive

rewriting or reorganization. As with programs, changeability in documents

is

achieved by minimizing external references and making the document

sections

as modular as possible. Thus, individual sections can be changed and

replaced

without affecting other parts of the document.

If a new requirement has to be urgently implemented, there is always a

tempta-

tion to change the system and then retrospectively modify the

requirements docu-

ment. This almost inevitably leads to the requirements specification and

the system

implementation getting out of step. Once system changes have been made,

it is easy

to forget to include these changes in the requirements document. In some

circum-

stances, emergency changes to a system have to be made. In those cases, it

is impor-

tant that you update the requirements document as soon as possible in

order to

include the revised requirements.

K e y P o i n t s

Requirements for a software system set out what the system should do

and define constraints on its operation and implementation.

Functional requirements are statements of the services that the system

must provide or are descriptions of how some computations must be

carried out.

Non-functional requirements often constrain the system being

developed and the development process being used. These might be

product requirements, organizational requirements, or external

requirements. They often relate to the emergent properties of the system

and therefore apply to the system as a whole.

The requirements engineering process includes requirements elicitation,

requirements specification, requirements validation, and requirements

management.

Requirements elicitation is an iterative process that can be represented

as a spiral of activities—

requirements discovery, requirements classification and organization,

requirements

negotiation, and requirements documentation.

Chapter 4 Website 135

Requirements specification is the process of formally documenting the

user and system requirements and creating a software requirements

document.

The software requirements document is an agreed statement of the

system requirements. It should be organized so that both system customers

and software developers can use it.

Requirements validation is the process of checking the requirements for

validity, consistency, completeness, realism, and verifiability.

Business, organizational, and technical changes inevitably lead to

changes to the requirements for a software system. Requirements

management is the process of managing and controlling these changes.

F u R t h E R R E a d I n g

“Integrated Requirements Engineering: A Tutorial.” This is a tutorial paper

that discusses requirements engineering activities and how these can be

adapted to fit with modern software engineering practice. (I. Sommerville,

IEEE Software, 22(1), January–February 2005) http://dx.doi.org/10.1109/

MS.2005.13.

“Research Directions in Requirements Engineering.” This is a good survey

of requirements engineering research that highlights future research

challenges in the area to address issues such as scale and agility. (B. H. C.

Cheng and J. M. Atlee, Proc. Conf. on Future of Software Engineering, IEEE

Com-

puter Society, 2007) http://dx.doi.org/10.1109/FOSE.2007.17.

Mastering the Requirements Process, 3rd ed. A well-written, easy-to-read

book that is based on a particular method (VOLERE) but that also includes

lots of good general advice about requirements engineering. (S. Robertson

and J. Robertson, 2013, Addison-Wesley).

W E b s I t E

PowerPoint slides for this chapter:

www.pearsonglobaleditions.com/Sommerville

Links to supporting videos:

http://software-engineering-book.com/videos/requirements-and-design/

Requirements document for the insulin pump:

http://software-engineering-book.com/case-studies/insulin-pump/

Mentcare system requirements information:

http://software-engineering-book.com/case-studies/mentcare-system/

136 Chapter 4 Requirements engineering

E x E R C I s E s

4.1. Identify and briefly describe four types of requirements that may be

defined for a computer-based system.

4.2. Discover ambiguities or omissions in the following statement of the

requirements for part of a drone system intended for search and recovery:

The drone, a quad chopper, will be very useful in search and recovery

operations, especially in remote areas or in extreme weather conditions. It will

click high-resolution images. It will fly according to a path preset by a ground

operator, but will be able to avoid obstacles on its own, returning to its original

path whenever possible. The drone will also be able to identify various objects

and match them to the target it is looking for.

4.3. Rewrite the above description using the structured approach

described in this chapter.

Resolve the identified ambiguities in a sensible way.

4.4. Write a set of non-functional requirements for the drone system,

setting out its expected safety and response time.

4.5. Using the technique suggested here, where natural language

descriptions are presented in a standard format, write plausible user

requirements for the following functions:

An unattended petrol (gas) pump system that includes a credit card

reader. The customer swipes the card through the reader, then specifies

the amount of fuel required. The fuel is delivered and the customer’s

account debited.

The cash-dispensing function in a bank ATM.

In an Internet banking system, a facility that allows customers to transfer

funds from one account held with the bank to another account with the

same bank.

4.6. Suggest how an engineer responsible for drawing up a system

requirements specification might keep track of the relationships between

functional and non-functional requirements.

4.7. Using your knowledge of how an ATM is used, develop a set of use

cases that could serve as a basis for understanding the requirements for an

ATM system.

4.8. To minimize mistakes during a requirements review, an organization

decides to allocate two scribes to document the review session. Explain

how this can be done.

4.9. When emergency changes have to be made to systems, the system

software may have to be modified before changes to the requirements

have been approved. Suggest a model of a process for making these

modifications that will ensure that the requirements document and the

system implementation do not become inconsistent.

4.10. You have taken a job with a software user who has contracted your

previous employer to develop a system for them. You discover that your

company’s interpretation of the requirements is different from the

interpretation taken by your previous employer. Discuss what you

Chapter 4 References 137

should do in such a situation. You know that the costs to your current

employer will increase if the ambiguities are not resolved. However, you

also have a responsibility of confidentiality to your previous employer.

R E F E R E n C E s

Crabtree, A. 2003. Designing Collaborative Systems: A Practical Guide to

Ethnography. London: Springer-Verlag.

Davis, A. M. 1993. Software Requirements: Objects, Functions and States.

Englewood Cliffs, NJ: Prentice-Hall.

IBM. 2013. “Rational Doors Next Generation: Requirements Engineering

for Complex Systems.”

https://jazz.net/products/rational-doors-next-generation/

IEEE. 1998. “IEEE Recommended Practice for Software Requirements

Specifications.” In IEEE Software Engineering Standards Collection. Los

Alamitos, CA: IEEE Computer Society Press.

Jacobsen, I., M. Christerson, P. Jonsson, and G. Overgaard. 1993. Object-

Oriented Software Engineering.

Wokingham, UK: Addison-Wesley.

Martin, D., and I. Sommerville. 2004. “Patterns of Cooperative Interaction:

Linking Ethnomethodol-ogy and Design.” ACM Transactions on Computer-

Human Interaction 11 (1) (March 1): 59–89.

doi:10.1145/972648.972651.

Rittel, H., and M. Webber. 1973. “Dilemmas in a General Theory of

Planning.” Policy Sciences 4: 155–169. doi:10.1007/BF01405730.

Robertson, S., and J. Robertson. 2013. Mastering the Requirements Process,

3rd ed. Boston: Addison-Wesley.

Sommerville, I., T. Rodden, P. Sawyer, R. Bentley, and M. Twidale. 1993.

“Integrating Ethnography into the Requirements Engineering Process.” In

RE’93, 165–173. San Diego, CA: IEEE Computer Society Press.

doi:10.1109/ISRE.1993.324821.

Stevens, P., and R. Pooley. 2006. Using UML: Software Engineering with

Objects and Components, 2nd ed. Harlow, UK: Addison-Wesley.

Suchman, L. 1983. “Office Procedures as Practical Action: Models of Work

and System Design.” ACM

Transactions on Office Information Systems 1 (3): 320–328.

doi:10.1145/357442.357445.

Viller, S., and I. Sommerville. 2000. “Ethnographically Informed Analysis

for Software Engineers.”

Int. J. of Human-Computer Studies 53 (1): 169–196. doi:10.1006/

ijhc.2000.0370.

5

System modeling

Objectives

The aim of this chapter is to introduce system models that may be

developed as part of requirements engineering and system design

processes. When you have read the chapter, you will:

understand how graphical models can be used to represent

software systems and why several types of model are needed to

fully represent a system;

understand the fundamental system modeling perspectives of

context, interaction, structure, and behavior;

understand the principal diagram types in the Unified Modeling

Language (UML) and how these diagrams may be used in system

modeling;

have been introduced to model-driven engineering, where an

executable system is automatically generated from structural and

behavioral models.

Contents

5.1 Context models

5.2 Interaction models

5.3 Structural models

5.4 Behavioral models

5.5 Model-driven engineering

Chapter 5 System modeling 139

System modeling is the process of developing abstract models of a system,

with each

model presenting a different view or perspective of that system. System

modeling

now usually means representing a system using some kind of graphical

notation

based on diagram types in the Unified Modeling Language (UML).

However, it is

also possible to develop formal (mathematical) models of a system, usually

as a

detailed system specification. I cover graphical modeling using the UML

here, and

formal modeling is briefly discussed in Chapter 10.

Models are used during the requirements engineering process to help

derive the

detailed requirements for a system, during the design process to describe

the system to engineers implementing the system, and after

implementation to document the

system’s structure and operation. You may develop models of both the

existing sys-

tem and the system to be developed:

1. Models of the existing system are used during requirements

engineering. They

help clarify what the existing system does, and they can be used to focus a

stake-

holder discussion on its strengths and weaknesses.

2. Models of the new system are used during requirements engineering to

help

explain the proposed requirements to other system stakeholders. Engineers

use

these models to discuss design proposals and to document the system for

imple-

mentation. If you use a model-driven engineering process (Brambilla,

Cabot,

and Wimmer 2012), you can generate a complete or partial system

implementa-

tion from system models.

It is important to understand that a system model is not a complete

representation of system. It purposely leaves out detail to make it easier to

understand. A model is an abstraction of the system being studied rather

than an alternative representation of that system. A representation of a

system should maintain all the information about the entity being

represented. An abstraction deliberately simplifies a system design and

picks out the most salient characteristics. For example, the PowerPoint

slides that accompany this book are an abstraction of the book’s key

points. However, if the book were translated from English into Italian, this

would be an alternative representation. The translator’s intention would be

to maintain all the information as it is presented in English.

You may develop different models to represent the system from different

perspectives. For example:

1. An external perspective, where you model the context or environment

of the

system.

2. An interaction perspective, where you model the interactions between a

system

and its environment, or between the components of a system.

3. A structural perspective, where you model the organization of a system

or the

structure of the data processed by the system.

4. A behavioral perspective, where you model the dynamic behavior of the

system

and how it responds to events.

140 Chapter 5 System modeling

The Unified Modeling Language

The Unified Modeling Language (UML) is a set of 13 different diagram

types that may be used to model software systems. It emerged from work

in the 1990s on object-oriented modeling, where similar object-oriented

notations were integrated to create the UML. A major revision (UML 2)

was finalized in 2004. The UML is universally accepted as the standard

approach for developing models of software systems. Variants, such as

SysML, have been proposed for more general system modeling.

http://software-engineering-book.com/web/uml/

When developing system models, you can often be flexible in the way that

the

graphical notation is used. You do not always need to stick rigidly to the

details of a notation. The detail and rigor of a model depend on how you

intend to use it. There

are three ways in which graphical models are commonly used:

1. As a way to stimulate and focus discussion about an existing or

proposed sys-

tem. The purpose of the model is to stimulate and focus discussion among

the

software engineers involved in developing the system. The models may be

incomplete (as long as they cover the key points of the discussion), and

they

may use the modeling notation informally. This is how models are

normally

used in agile modeling (Ambler and Jeffries 2002).

2. As a way of documenting an existing system. When models are used as

docu-

mentation, they do not have to be complete, as you may only need to use

models

to document some parts of a system. However, these models have to be

correct—

they should use the notation correctly and be an accurate description of

the

system.

3. As a detailed system description that can be used to generate a system

imple-

mentation. Where models are used as part of a model-based development

pro-

cess, the system models have to be both complete and correct. They are

used as

a basis for generating the source code of the system, and you therefore

have to

be very careful not to confuse similar symbols, such as stick and block

arrow-

heads, that may have different meanings.

In this chapter, I use diagrams defined in the Unified Modeling Language

(UML) (Rumbaugh, Jacobson, and Booch 2004; Booch, Rumbaugh, and

Jacobson 2005), which has become a standard language for object-

oriented mod-

eling. The UML has 13 diagram types and so supports the creation of

many

different types of system model. However, a survey (Erickson and Siau

2007)

showed that most users of the UML thought that five diagram types could

repre-

sent the essentials of a system. I therefore concentrate on these five UML

diagram

types here:

5.1 Context models 141

1. Activity diagrams, which show the activities involved in a process or in

data processing.

2. Use case diagrams, which show the interactions between a system and its

environment.

3. Sequence diagrams, which show interactions between actors and the

system and between system components.

4. Class diagrams, which show the object classes in the system and the

associations between these classes.

5. State diagrams, which show how the system reacts to internal and

external events.

5.1 Context models

At an early stage in the specification of a system, you should decide on the

system

boundaries, that is, on what is and is not part of the system being

developed. This

involves working with system stakeholders to decide what functionality

should be

included in the system and what processing and operations should be

carried out in

the system’s operational environment. You may decide that automated

support for

some business processes should be implemented in the software being

developed but

that other processes should be manual or supported by different systems.

You should

look at possible overlaps in functionality with existing systems and decide

where

new functionality should be implemented. These decisions should be made

early in

the process to limit the system costs and the time needed for

understanding the sys-

tem requirements and design.

In some cases, the boundary between a system and its environment is

relatively

clear. For example, where an automated system is replacing an existing

manual or

computerized system, the environment of the new system is usually the

same as the

existing system’s environment. In other cases, there is more flexibility, and

you

decide what constitutes the boundary between the system and its

environment during

the requirements engineering process.

For example, say you are developing the specification for the Mentcare

patient

information system. This system is intended to manage information about

patients

attending mental health clinics and the treatments that have been

prescribed. In developing the specification for this system, you have to

decide whether the system should focus exclusively on collecting

information about consultations (using other systems to collect personal

information about patients) or whether it should also collect personal

patient information. The advantage of relying on other systems for patient

information is that you avoid duplicating data. The major disadvantage,

however, is that using other systems may make it slower to access

information, and if these systems are unavailable, then it may be

impossible to use the Mentcare system.

In some situations, the user base for a system is very diverse, and users

have a

wide range of different system requirements. You may decide not to define

142 Chapter 5 System modeling

«system»

Patient record

system

«system»

«system»

Management

Admissions

reporting

system

system

«system»

Mentcare

«system»

«system»

Prescription

HC statistics

system

system

«system»

Appointments

Figure 5.1 The context

system

of the Mentcare system

boundaries explicitly but instead to develop a configurable system that can

be

adapted to the needs of different users. This was the approach that we

adopted in the iLearn systems, introduced in Chapter 1. There, users range

from very young

children who can’t read through to young adults, their teachers, and

school adminis-

trators. Because these groups need different system boundaries, we

specified a

configuration system that would allow the boundaries to be specified

when the

system was deployed.

The definition of a system boundary is not a value-free judgment. Social

and

organizational concerns may mean that the position of a system boundary

may be

determined by nontechnical factors. For example, a system boundary may

be delib-

erately positioned so that the complete analysis process can be carried out

on one

site; it may be chosen so that a particularly difficult manager need not be

consulted; and it may be positioned so that the system cost is increased

and the system development division must therefore expand to design and

implement the system.

Once some decisions on the boundaries of the system have been made,

part of the

analysis activity is the definition of that context and the dependencies that

a system has on its environment. Normally, producing a simple

architectural model is the first step in this activity.

Figure 5.1 is a context model that shows the Mentcare system and the

other

systems in its environment. You can see that the Mentcare system is

connected to

an appointments system and a more general patient record system with

which it

shares data. The system is also connected to systems for management

reporting and

hospital admissions, and a statistics system that collects information for

research.

Finally, it makes use of a prescription system to generate prescriptions for

patients’

medication.

Context models normally show that the environment includes several

other auto-

mated systems. However, they do not show the types of relationships

between the

systems in the environment and the system that is being specified.

External systems

might produce data for or consume data from the system. They might

share data with

the system, or they might be connected directly, through a network or not

connected

at all. They might be physically co-located or located in separate

buildings. All of

5.1 Context models 143

Transfer to

Confirm

[not available]

police station

detention

decision

Find secure

place

Transfer to

Inform

[available]

secure hospital

[dangerous]

social care

Inform

patient of

Inform next

rights

of kin

Record

Admit to

Update

detention

hospital

register

decision

[not

dangerous]

«system»

«system»

Mentcare

«system»

Admissions

Mentcare

system

Figure 5.2 A process

model of involuntary

these relations may affect the requirements and design of the system being

defined

detention

and so must be taken into account. Therefore, simple context models are

used along

with other models, such as business process models. These describe human

and auto-

mated processes in which particular software systems are used.

UML activity diagrams may be used to show the business processes in

which

systems are used. Figure 5.2 is a UML activity diagram that shows where

the

Mentcare system is used in an important mental health care process—

involuntary

detention.

Sometimes, patients who are suffering from mental health problems may

be a

danger to others or to themselves. They may therefore have to be detained

against

their will in a hospital so that treatment can be administered. Such

detention is subject to strict legal safeguards—for example, the decision to

detain a patient must be regularly reviewed so that people are not held

indefinitely without good reason. One critical function of the Mentcare

system is to ensure that such safeguards are implemented and that the

rights of patients are respected.

UML activity diagrams show the activities in a process and the flow of

control

from one activity to another. The start of a process is indicated by a filled

circle, the end by a filled circle inside another circle. Rectangles with

round corners represent activities, that is, the specific subprocesses that

must be carried out. You may include objects in activity charts. Figure 5.2

shows the systems that are used to support different subprocesses within

the involuntary detection process. I have shown that these are separate

systems by using the UML stereotype feature where the type of entity in

the box between chevrons is shown.

Arrows represent the flow of work from one activity to another, and a

solid bar

indicates activity coordination. When the flow from more than one

activity leads to a

144 Chapter 5 System modeling

Transfer data

Figure 5.3 Transfer-data

use case

Medical receptionist

Patient record system

solid bar, then all of these activities must be complete before progress is

possible.

When the flow from a solid bar leads to a number of activities, these may

be exe-

cuted in parallel. Therefore, in Figure 5.2, the activities to inform social

care and the patient’s next of kin, as well as to update the detention

register, may be concurrent.

Arrows may be annotated with guards (in square brackets) that specify

when that

flow is followed. In Figure 5.2, you can see guards showing the flows for

patients

who are dangerous and not dangerous to society. Patients who are

dangerous to soci-

ety must be detained in a secure facility. However, patients who are

suicidal and are a danger to themselves may be admitted to an appropriate

ward in a hospital, where

they can be kept under close supervision.

5.2 Interaction models

All systems involve interaction of some kind. This can be user interaction,

which

involves user inputs and outputs; interaction between the software being

developed and other systems in its environment; or interaction between

the components of a software system. User interaction modeling is

important as it helps to identify user requirements.

Modeling system-to-system interaction highlights the communication

problems that

may arise. Modeling component interaction helps us understand if a

proposed system

structure is likely to deliver the required system performance and

dependability.

This section discusses two related approaches to interaction modeling:

1. Use case modeling, which is mostly used to model interactions between

a sys-

tem and external agents (human users or other systems).

2. Sequence diagrams, which are used to model interactions between

system com-

ponents, although external agents may also be included.

Use case models and sequence diagrams present interactions at different

levels of

detail and so may be used together. For example, the details of the

interactions

involved in a high-level use case may be documented in a sequence

diagram. The

UML also includes communication diagrams that can be used to model

interactions.

I don’t describe this diagram type because communication diagrams are

simply an

alternative representation of sequence diagrams.

5.2.1 Use case modeling

Use case modeling was originally developed by Ivar Jacobsen in the 1990s

(Jacobsen

et al. 1993), and a UML diagram type to support use case modeling is part

of the

5.2 Interaction models 145

Mentcare system: Transfer data

Actors

Medical receptionist, Patient records system (PRS)

Description

A receptionist may transfer data from the Mentcare system to a

general patient record database that is maintained by a health

authority. The information transferred may either be updated

personal information (address, phone number, etc.) or a

summary of the patient’s diagnosis and treatment.

Data

Patient’s personal information, treatment summary

Stimulus

User command issued by medical receptionist

Response

Confirmation that PRS has been updated

Figure 5.4 Tabular

description of the

Comments

The receptionist must have appropriate security permissions to

Transfer-data use case

access the patient information and the PRS.

UML. A use case can be taken as a simple description of what a user

expects from a

system in that interaction. I have discussed use cases for requirements

elicitation in Chapter 4. As I said in Chapter 4, I find use case models to be

more useful in the

early stages of system design rather than in requirements engineering.

Each use case represents a discrete task that involves external interaction

with a

system. In its simplest form, a use case is shown as an ellipse, with the

actors

involved in the use case represented as stick figures. Figure 5.3 shows a

use case

from the Mentcare system that represents the task of uploading data from

the

Mentcare system to a more general patient record system. This more

general system

maintains summary data about a patient rather than data about each

consultation,

which is recorded in the Mentcare system.

Notice that there are two actors in this use case—the operator who is

transferring

the data and the patient record system. The stick figure notation was

originally developed to cover human interaction, but it is also used to

represent other external systems and hardware. Formally, use case

diagrams should use lines without arrows as

arrows in the UML indicate the direction of flow of messages. Obviously,

in a use

case, messages pass in both directions. However, the arrows in Figure 5.3

are used

informally to indicate that the medical receptionist initiates the

transaction and data is transferred to the patient record system.

Use case diagrams give a simple overview of an interaction, and you need

to add

more detail for complete interaction description. This detail can either be

a simple textual description, a structured description in a table, or a

sequence diagram. You choose the most appropriate format depending on

the use case and the level of detail that you think is required in the model.

I find a standard tabular format to be the most useful. Figure 5.4 shows a

tabular description of the “Transfer data” use case.

Composite use case diagrams show a number of different use cases.

Sometimes it

is possible to include all possible interactions within a system in a single

composite use case diagram. However, this may be impossible because of

the number of use

cases. In such cases, you may develop several diagrams, each of which

shows related

use cases. For example, Figure 5.5 shows all of the use cases in the

Mentcare system

146 Chapter 5 System modeling

Register

patient

Unregister

patient

View patient

info.

Medical

receptionist

Transfer data

Figure 5.5 Use cases

Contact

involving the role

patient

“Medical receptionist”

in which the actor “Medical Receptionist” is involved. Each of these

should be

accompanied by a more detailed description.

The UML includes a number of constructs for sharing all or part of a use

case in

other use case diagrams. While these constructs can sometimes be helpful

for system

designers, my experience is that many people, especially end-users, find

them diffi-

cult to understand. For this reason, these constructs are not described

here.

5.2.2 Sequence diagrams

Sequence diagrams in the UML are primarily used to model the

interactions between

the actors and the objects in a system and the interactions between the

objects themselves. The UML has a rich syntax for sequence diagrams,

which allows many dif-

ferent kinds of interaction to be modeled. As space does not allow

covering all

possibilities here, the focus will be on the basics of this diagram type.

As the name implies, a sequence diagram shows the sequence of

interactions that

take place during a particular use case or use case instance. Figure 5.6 is

an example of a sequence diagram that illustrates the basics of the

notation. This diagram models the interactions involved in the View

patient information use case, where a medical

receptionist can see some patient information.

The objects and actors involved are listed along the top of the diagram,

with a

dotted line drawn vertically from these. Annotated arrows indicate

interactions

between objects. The rectangle on the dotted lines indicates the lifeline of

the object concerned (i.e., the time that object instance is involved in the

computation). You

read the sequence of interactions from top to bottom. The annotations on

the arrows

indicate the calls to the objects, their parameters, and the return values.

This example also shows the notation used to denote alternatives. A box

named alt is used with the

5.2 Interaction models 147

Medical Receptionist

P: PatientInfo

D: Mentcare-DB

AS: Authorization

ViewInfo (PID)

report (Info, PID,

UID)

authorize (Info,

UID)

authorization

alt

[authorization OK]

Patient info

[authorization fail]

Error (no access)

Figure 5.6 Sequence

diagram for View patient

information

conditions indicated in square brackets, with alternative interaction

options sepa-

rated by a dotted line.

You can read Figure 5.6 as follows:

1. The medical receptionist triggers the ViewInfo method in an instance P

of the

PatientInfo object class, supplying the patient’s identifier, PID to identify

the

required information. P is a user interface object, which is displayed as a

form

showing patient information.

2. The instance P calls the database to return the information required,

supplying

the receptionist’s identifier to allow security checking. (At this stage, it is

not

important where the receptionist’s UID comes from.)

3. The database checks with an authorization system that the receptionist

is authorized for this action.

4. If authorized, the patient information is returned and is displayed on a

form on the user’s screen. If authorization fails, then an error message is

returned. The

box denoted by “alt” in the top-left corner is a choice box indicating that

one of

the contained interactions will be executed. The condition that selects the

choice

is shown in square brackets.

Figure 5.7 is a further example of a sequence diagram from the same

system that

illustrates two additional features. These are the direct communication

between the

actors in the system and the creation of objects as part of a sequence of

operations. In this example, an object of type Summary is created to hold

the summary data that is

148 Chapter 5 System modeling

Medical Receptionist

PRS

P: PatientInfo

D: Mentcare-DB

AS: Authorization

login ( )

ok

alt

[sendInfo]

updateInfo( )

updatePRS (UID )

authorize (TF, UID)

authorization

update (PID)

update OK

Message (OK)

[sendSummary]

UpdateSummary( )

summarize (UID )

authorize (TF, UID)

authorization

:summary

update (PID)

update OK

Message (OK)

logout ()

Figure 5.7 Sequence

diagram for

to be uploaded to a national PRS (patient records system). You can read

this diagram Transfer Data

as follows:

1. The receptionist logs on to the PRS.

2. Two options are available (as shown in the “alt” box). These allow the

direct

transfer of updated patient information from the Mentcare database to the

PRS and the transfer of summary health data from the Mentcare database

to

the PRS.

3. In each case, the receptionist’s permissions are checked using the

authorization system.

5.3 Structural models 149

4. Personal information may be transferred directly from the user interface

object

to the PRS. Alternatively, a summary record may be created from the

database,

and that record is then transferred.

5. On completion of the transfer, the PRS issues a status message and the

user logs off.

Unless you are using sequence diagrams for code generation or detailed

docu-

mentation, you don’t have to include every interaction in these diagrams.

If you

develop system models early in the development process to support

requirements

engineering and high-level design, there will be many interactions that

depend on

implementation decisions. For example, in Figure 5.7 the decision on how

to get the

user identifier to check authorization is one that can be delayed. In an

implementa-

tion, this might involve interacting with a User object. As this is not

important at this stage, you do not need to include it in the sequence

diagram.

5.3 Structural models

Structural models of software display the organization of a system in terms

of the

components that make up that system and their relationships. Structural

models may

be static models, which show the organization of the system design, or

dynamic

models, which show the organization of the system when it is executing.

These are

not the same things—the dynamic organization of a system as a set of

interacting

threads may be very different from a static model of the system

components.

You create structural models of a system when you are discussing and

designing

the system architecture. These can be models of the overall system

architecture or

more detailed models of the objects in the system and their relationships.

In this section, I focus on the use of class diagrams for modeling the static

struc-

ture of the object classes in a software system. Architectural design is an

important topic in software engineering, and UML component, package,

and deployment diagrams may all be used when presenting architectural

models. I cover architectural

modeling in Chapters 6 and 17.

5.3.1 Class diagrams

Class diagrams are used when developing an object-oriented system model

to show

the classes in a system and the associations between these classes. Loosely,

an object class can be thought of as a general definition of one kind of

system object. An association is a link between classes indicating that

some relationship exists between

these classes. Consequently, each class may have to have some knowledge

of its

associated class.

When you are developing models during the early stages of the software

engi-

neering process, objects represent something in the real world, such as a

patient, a

150 Chapter 5 System modeling

1

1

Patient

Figure 5.8 UML Classes

Patient

record

and association

Consultant

1 referred-to

1..*

1..*

1..*

1..*

1

Condition

Patient

General

practitioner

diagnosed-

referred-by

with

1..* attends

1..*

prescribes

Consultation

Medication

1..*

1..*

1..*

1..*

involves

prescribes

Treatment

1..4

1..*

Figure 5.9 Classes and

Hospital

associations in the

Doctor

Mentcare system

prescription, or a doctor. As an implementation is developed, you define

implemen-

tation objects to represent data that is manipulated by the system. In this

section, the focus is on the modeling of real-world objects as part of the

requirements or early

software design processes. A similar approach is used for data structure

modeling.

Class diagrams in the UML can be expressed at different levels of detail.

When

you are developing a model, the first stage is usually to look at the world,

identify the essential objects, and represent these as classes. The simplest

way of writing

these diagrams is to write the class name in a box. You can also note the

existence of an association by drawing a line between classes. For

example, Figure 5.8 is a simple class diagram showing two classes, Patient

and Patient Record, with an associa-

tion between them. At this stage, you do not need to say what the

association is.

Figure 5.9 develops the simple class diagram in Figure 5.8 to show that

objects of

class Patient are also involved in relationships with a number of other

classes. In this example, I show that you can name associations to give the

reader an indication of

the type of relationship that exists.

Figures 5.8 and 5.9, shows an important feature of class diagrams—the

ability to

show how many objects are involved in the association. In Figure 5.8 each

end of the association is annotated with a 1, meaning that there is a 1:1

relationship between

objects of these classes. That is, each patient has exactly one record, and

each record maintains information about exactly one patient.

As you can see from Figure 5.9, other multiplicities are possible. You can

define

that an exact number of objects are involved (e.g., 1..4) or, by using a *,

indicate that there are an indefinite number of objects involved in the

association. For example,

the (1..*) multiplicity in Figure 5.9 on the relationship between Patient

and Condition shows that a patient may suffer from several conditions and

that the same condition

may be associated with several patients.

5.3 Structural models 151

Consultation

Doctors

Date

Time

Clinic

Reason

Medication prescribed

Treatment prescribed

Voice notes

Transcript

...

New ( )

Prescribe ( )

RecordNotes ( )

Transcribe ( )

Figure 5.10 A

...

Consultation class

At this level of detail, class diagrams look like semantic data models.

Semantic

data models are used in database design. They show the data entities,

their associated attributes, and the relations between these entities (Hull

and King 1987). The UML

does not include a diagram type for database modeling, as it models data

using

objects and their relationships. However, you can use the UML to

represent a seman-

tic data model. You can think of entities in a semantic data model as

simplified

object classes (they have no operations), attributes as object class

attributes, and relations as named associations between object classes.

When showing the associations between classes, it is best to represent

these classes in the simplest possible way, without attributes or

operations. To define objects in more detail, you add information about

their attributes (the object’s characteristics) and operations (the object’s

functions). For example, a Patient object has the attribute Address, and

you may include an operation called ChangeAddress, which is called

when a patient indicates that he or she has moved from one address to

another.

In the UML, you show attributes and operations by extending the simple

rectangle

that represents a class. I illustrate this in Figure 5.10 that shows an object

representing a consultation between doctor and patient:

1. The name of the object class is in the top section.

2. The class attributes are in the middle section. This includes the attribute

names and, optionally, their types. I don’t show the types in Figure 5.10.

3. The operations (called methods in Java and other OO programming

languages)

associated with the object class are in the lower section of the rectangle. I

show

some but not all operations in Figure 5.10.

In the example shown in Figure 5.10, it is assumed that doctors record

voice notes

that are transcribed later to record details of the consultation. To prescribe

medication, the doctor involved must use the Prescribe method to

generate an electronic prescription.

152 Chapter 5 System modeling

Doctor

Hospital

General

doctor

practitioner

Consultant

Team doctor

Trainee

Qualified

Figure 5.11 A

doctor

doctor

generalization hierarchy

5.3.2 Generalization

Generalization is an everyday technique that we use to manage

complexity.

Rather than learn the detailed characteristics of everything that we

experience, we

learn about general classes (animals, cars, houses, etc.) and learn the

characteris-

tics of these classes. We then reuse knowledge by classifying things and

focus on

the differences between them and their class. For example, squirrels and

rats are

members of the class “rodents,” and so share the characteristics of rodents.

General statements apply to all class members; for example, all rodents

have teeth

for gnawing.

When you are modeling systems, it is often useful to examine the classes

in a

system to see if there is scope for generalization and class creation. This

means

that common information will be maintained in one place only. This is

good design

practice as it means that, if changes are proposed, then you do not have to

look at

all classes in the system to see if they are affected by the change. You can

make the changes at the most general level. In object-oriented languages,

such as Java,

generalization is implemented using the class inheritance mechanisms

built into

the language.

The UML has a specific type of association to denote generalization, as

illus-

trated in Figure 5.11. The generalization is shown as an arrowhead

pointing up to

the more general class. This indicates that general practitioners and

hospital doctors can be generalized as doctors and that there are three

types of Hospital Doctor:

those who have just graduated from medical school and have to be

supervised

(Trainee Doctor); those who can work unsupervised as part of a

consultant’s team

(Registered Doctor); and consultants, who are senior doctors with full

decision-

making responsibilities.

In a generalization, the attributes and operations associated with higher-

level

classes are also associated with the lower-level classes. The lower-level

classes are subclasses that inherit the attributes and operations from their

superclasses. These lower-level classes then add more specific attributes

and operations.

5.3 Structural models 153

Doctor

Name

Phone #

Email

register ( )

de-register ( )

Hospital doctor

General practitioner

Staff #

Practice

Figure 5.12 A

Pager #

Address

generalization hierarchy

with added detail

For example, all doctors have a name and phone number, and all hospital

doc-

tors have a staff number and carry a pager. General practitioners don’t

have these

attributes, as they work independently, but they have an individual

practice name

and address. Figure 5.12 shows part of the generalization hierarchy, which

I have

extended with class attributes, for the class Doctor. The operations

associated with the class Doctor are intended to register and de-register

that doctor with the

Mentcare system.

5.3.3 Aggregation

Objects in the real world are often made up of different parts. For

example, a study pack for a course may be composed of a book,

PowerPoint slides, quizzes, and recommendations for further reading.

Sometimes in a system model, you need to illus-

trate this. The UML provides a special type of association between classes

called

aggregation, which means that one object (the whole) is composed of

other objects

(the parts). To define aggregation, a diamond shape is added to the link

next to the class that represents the whole.

Figure 5.13 shows that a patient record is an aggregate of Patient and an

indefinite number of Consultations. That is, the record maintains personal

patient information

as well as an individual record for each consultation with a doctor.

Patient record

1

1

1

1..*

Figure 5.13 The

Patient

Consultation

aggregation association

154 Chapter 5 System modeling

Data flow diagrams

Data-flow diagrams (DFDs) are system models that show a functional

perspective where each transformation represents a single function or

process. DFDs are used to show how data flows through a sequence of

processing steps. For example, a processing step could be the filtering of

duplicate records in a customer database. The data is transformed at each

step before moving on to the next stage. These processing steps or

transformations represent software processes or functions, where data-

flow diagrams are used to document a software design.

Activity diagrams in the UML may be used to represent DFDs.

http://software-engineering-book.com/web/dfds/

5.4 Behavioral models

Behavioral models are models of the dynamic behavior of a system as it is

execut-

ing. They show what happens or what is supposed to happen when a

system responds

to a stimulus from its environment. These stimuli may be either data or

events:

1. Data becomes available that has to be processed by the system. The

availability

of the data triggers the processing.

2. An event happens that triggers system processing. Events may have

associated

data, although this is not always the case.

Many business systems are data-processing systems that are primarily

driven by

data. They are controlled by the data input to the system, with relatively

little external event processing. Their processing involves a sequence of

actions on that data

and the generation of an output. For example, a phone billing system will

accept

information about calls made by a customer, calculate the costs of these

calls, and

generate a bill for that customer.

By contrast, real-time systems are usually event-driven, with limited data

pro-

cessing. For example, a landline phone switching system responds to

events such as

“handset activated” by generating a dial tone, pressing keys on a handset

by captur-

ing the phone number, and so on.

5.4.1 Data-driven modeling

Data-driven models show the sequence of actions involved in processing

input data

and generating an associated output. They can be used during the analysis

of requirements as they show end-to-end processing in a system. That is,

they show the entire

sequence of actions that takes place from an initial input being processed

to the corresponding output, which is the system’s response.

Data-driven models were among the first graphical software models. In the

1970s,

structured design methods used data-flow diagrams (DFDs) as a way to

illustrate the

5.4 Behavioral models 155

Blood sugar

Get sensor

Sensor

Compute

Blood sugar

sensor

value

data

sugar level

level

Calculate

insulin

delivery

Calculate

Insulin

Control

Pump control

Insulin

pump

pump

pump

commands

requirement

commands

Figure 5.14 An activity processing steps in a system. Data-flow models are

useful because tracking and doc-model of the insulin

umenting how data associated with a particular process moves through

the system

pump’s operation

help analysts and designers understand what is going on in the process.

DFDs are

simple and intuitive and so are more accessible to stakeholders than some

other types of model. It is usually possible to explain them to potential

system users who can

then participate in validating the model.

Data-flow diagrams can be represented in the UML using the activity

diagram

type, described in Section 5.1. Figure 5.14 is a simple activity diagram

that shows

the chain of processing involved in the insulin pump software. You can see

the

processing steps, represented as activities (rounded rectangles), and the

data flowing between these steps, represented as objects (rectangles).

An alternative way of showing the sequence of processing in a system is to

use

UML sequence diagrams. You have seen how these diagrams can be used

to model

interaction, but if you draw these so that messages are only sent from left

to right, then they show the sequential data processing in the system.

Figure 5.15 illustrates this, using a sequence model of processing an order

and sending it to a supplier.

Sequence models highlight objects in a system, whereas data-flow

diagrams high-

light the operations or activities. In practice, nonexperts seem to find data-

flow diagrams more intuitive, but engineers prefer sequence diagrams.

Purchase officer

Supplier

:Order

«datastore»

Budget

Orders

Fillin ( )

Validate ( )

[validation ok]

Update (amount)

Save ( )

Figure 5.15 Order

Send ( )

processing

156 Chapter 5 System modeling

Full

power

Full power

do: set power

= 600

Timer

Waiting

Number

do: display

Full

Set time

Operation

time

power

do: operate

do: get number

exit: set time

oven

Half

Half

power

Door

power

Timer

closed

Cancel

Start

Door

open

Door

Half power

Enabled

Waiting

open

do: set power

Door

do: display

do: display

= 300

closed

'Ready'

time

Disabled

Figure 5.16 A state

do: display

diagram of a

'Waiting'

microwave oven

5.4.2 Event-driven modeling

Event-driven modeling shows how a system responds to external and

internal

events. It is based on the assumption that a system has a finite number of

states

and that events (stimuli) may cause a transition from one state to another.

For

example, a system controlling a valve may move from a state “Valve

open” to a

state “Valve closed” when an operator command (the stimulus) is

received. This

view of a system is particularly appropriate for real-time systems. Event-

driven

modeling is used extensively when designing and documenting real-time

systems

(Chapter 21).

The UML supports event-based modeling using state diagrams, which are

based

on Statecharts (Harel 1987). State diagrams show system states and events

that cause transitions from one state to another. They do not show the

flow of data within the

system but may include additional information on the computations

carried out in

each state.

I use an example of control software for a very simple microwave oven to

illus-

trate event-driven modeling (Figure 5.16). Real microwave ovens are

much more

complex than this system, but the simplified system is easier to

understand. This

simple oven has a switch to select full or half power, a numeric keypad to

input the cooking time, a start/stop button, and an alphanumeric display.

5.4 Behavioral models 157

Operation

Time

Checking

OK

Cook

do: check

do: run

status

generator

Turntable

Emitter

Timeout

fault

fault

Done

Alarm

do: buzzer on

do: display

for 5 secs.

event

Door open

Cancel

Figure 5.17 A state

model of the

Disabled

Waiting

Operation state

I have assumed that the sequence of actions in using the microwave is as

follows:

1. Select the power level (either half power or full power).

2. Input the cooking time using a numeric keypad.

3. Press

Start and the food is cooked for the given time.

For safety reasons, the oven should not operate when the door is open,

and, on

completion of cooking, a buzzer is sounded. The oven has a simple display

that is

used to display various alerts and warning messages.

In UML state diagrams, rounded rectangles represent system states. They

may

include a brief description (following “do”) of the actions taken in that

state. The labeled arrows represent stimuli that force a transition from one

state to another. You can indicate start and end states using filled circles,

as in activity diagrams.

From Figure 5.16, you can see that the system starts in a waiting state and

responds initially to either the full-power or the half-power button. Users

can change their minds after selecting one of these and may press the

other button. The time is set and, if the door is closed, the Start button is

enabled. Pushing this button starts the oven operation, and cooking takes

place for the specified time. This is the end of the cooking cycle, and the

system returns to the waiting state.

The problem with state-based modeling is that the number of possible

states

increases rapidly. For large system models, therefore, you need to hide

detail in the models. One way to do this is by using the notion of a

“superstate” that encapsulates a number of separate states. This superstate

looks like a single state on a high-level model but is then expanded to

show more detail on a separate diagram. To illustrate

this concept, consider the Operation state in Figure 5.16. This is a

superstate that can be expanded, as shown in Figure 5.17.

158 Chapter 5 System modeling

State

Description

Waiting

The oven is waiting for input. The display shows the

current time.

Half power

The oven power is set to 300 watts. The display shows

“Half power.”

Full power

The oven power is set to 600 watts. The display shows

“Full power.”

Set time

The cooking time is set to the user’s input value. The display

shows the cooking time selected and is updated as the time

is set.

Disabled

Oven operation is disabled for safety. Interior oven light is on.

Display shows “Not ready.”

Enabled

Oven operation is enabled. Interior oven light is off. Display

shows “Ready to cook.”

Operation

Oven in operation. Interior oven light is on. Display shows the

timer countdown. On completion of cooking, the buzzer is

sounded for 5 seconds. Oven light is on. Display shows

“Cooking complete” while buzzer is sounding.

Stimulus

Description

Half power

The user has pressed the half-power button.

Full power

The user has pressed the full-power button.

Timer

The user has pressed one of the timer buttons.

Number

The user has pressed a numeric key.

Door open

The oven door switch is not closed.

Door closed

The oven door switch is closed.

Figure 5.18 States and

Start

The user has pressed the Start button.

stimuli for the

microwave oven

Cancel

The user has pressed the Cancel button.

The Operation state includes a number of substates. It shows that

operation starts

with a status check and that if any problems are discovered an alarm is

indicated and operation is disabled. Cooking involves running the

microwave generator for the

specified time; on completion, a buzzer is sounded. If the door is opened

during

operation, the system moves to the disabled state, as shown in Figure

5.17.

State models of a system provide an overview of event processing, but you

nor-

mally have to extend this with a more detailed description of the stimuli

and the system states. You may use a table to list the states and events

that stimulate state transitions along with a description of each state and

event. Figure 5.18 shows a tabular description of each state and how the

stimuli that force state transitions are generated.

5.4.3 Model-driven engineering

Model-driven engineering (MDE) is an approach to software development

whereby

models rather than programs are the principal outputs of the development

process

5.5 Model-driven architecture 159

(Brambilla, Cabot, and Wimmer 2012). The programs that execute on a

hardware/

software platform are generated automatically from the models.

Proponents of MDE

argue that this raises the level of abstraction in software engineering so

that engineers no longer have to be concerned with programming

language details or the specifics

of execution platforms.

Model-driven engineering was developed from the idea of model-driven

archi-

tecture (MDA). This was proposed by the Object Management Group

(OMG) as a

new software development paradigm (Mellor, Scott, and Weise 2004).

MDA

focuses on the design and implementation stages of software development,

whereas

MDE is concerned with all aspects of the software engineering process.

Therefore,

topics such as model-based requirements engineering, software processes

for

model-based development, and model-based testing are part of MDE but

are not

considered in MDA.

MDA as an approach to system engineering has been adopted by a number

of

large companies to support their development processes. This section

focuses on the

use of MDA for software implementation rather than discuss more general

aspects of

MDE. The take-up of more general model-driven engineering has been

slow, and

few companies have adopted this approach throughout their software

development

life cycle. In his blog, den Haan discusses possible reasons why MDE has

not been

widely adopted (den Haan 2011).

5.5 Model-driven architecture

Model-driven architecture (Mellor, Scott, and Weise 2004; Stahl and

Voelter

2006) is a model-focused approach to software design and implementation

that

uses a subset of UML models to describe a system. Here, models at

different

levels of abstraction are created. From a high-level, platform independent

model,

it is possible, in principle, to generate a working program without manual

intervention.

The MDA method recommends that three types of abstract system model

should

be produced:

1. A computation independent model (CIM) CIMs model the important

domain

abstractions used in a system and so are sometimes called domain models.

You

may develop several different CIMs, reflecting different views of the

system.

For example, there may be a security CIM in which you identify important

secu-

rity abstractions such as an asset, and a role and a patient record CIM, in

which

you describe abstractions such as patients and consultations.

2. A platform-independent model (PIM) PIMs model the operation of the

system without reference to its implementation. A PIM is usually described

using UML

models that show the static system structure and how it responds to

external and

internal events.

160 Chapter 5 System modeling

Computation

Platform

Platform

Executable

independent

independent

specific model

code

model

model

Translator

Translator

Translator

Domain specific

Platform

Language

guidelines

specific patterns

specific

Figure 5.19 MDA

and rules

patterns

transformations

3. Platform-specific models (PSM) PSMs are transformations of the platform-

independent model with a separate PSM for each application platform. In

principle, there may be layers of PSM, with each layer adding some

platform-

specific detail. So, the first level PSM could be middleware-specific but

database-independent. When a specific database has been chosen, a

database-

specific PSM can then be generated.

Model-based engineering allows engineers to think about systems at a high

level of abstraction, without concern for the details of their

implementation. This

reduces the likelihood of errors, speeds up the design and implementation

process,

and allows for the creation of reusable, platform-independent application

models.

By using powerful tools, system implementations can be generated for

different

platforms from the same model. Therefore, to adapt the system to some

new plat-

form technology, you write a model translator for that platform. When

this is

available, all platform-independent models can then be rapidly re-hosted

on the

new platform.

Fundamental to MDA is the notion that transformations between models

can be

defined and applied automatically by software tools, as illustrated in

Figure 5.19.

This diagram also shows a final level of automatic transformation where a

transfor-

mation is applied to the PSM to generate the executable code that will run

on the

designated software platform. Therefore, in principle at least, executable

software

can be generated from a high-level system model.

In practice, completely automated translation of models to code is rarely

possi-

ble. The translation of high-level CIM to PIM models remains a research

problem,

and for production systems, human intervention, illustrated using a stick

figure in

Figure 5.19, is normally required. A particularly difficult problem for

automated

model transformation is the need to link the concepts used in different

CIMS. For

example, the concept of a role in a security CIM that includes role-driven

access

control may have to be mapped onto the concept of a staff member in a

hospital

CIM. Only a person who understands both security and the hospital

environment can

make this mapping.

5.5 Model-driven architecture 161

J2EE specific

Java code

Java program

J2EE Translator

model

generator

Platform

independent

model

.NET specific

C# code

.Net Translator

C# program

Figure 5.20 Multiple

model

generator

platform-specific models

The translation of platform-independent to platform-specific models is a

simpler

technical problem. Commercial tools and open-source tools (Koegel 2012)

are avail-

able that provide translators from PIMS to common platforms such as Java

and

J2EE. These use an extensive library of platform-specific rules and

patterns to

convert a PIM to a PSM. There may be several PSMs for each PIM in the

system. If

a software system is intended to run on different platforms (e.g., J2EE and

.NET),

then, in principle, you only have to maintain a single PIM. The PSMs for

each

platform are automatically generated (Figure 5.20).

Although MDA support tools include platform-specific translators, these

sometimes only offer partial support for translating PIMS to PSMs. The

execution

environment for a system is more than the standard execution platform,

such as J2EE

or Java. It also includes other application systems, specific application

libraries that may be created for a company, external services, and user

interface libraries.

These vary from one company to another, so off-the-shelf tool support is

not

available that takes these into account. Therefore, when MDA is

introduced into an

organization, special-purpose translators may have to be created to make

use of the

facilities available in the local environment. This is one reason why many

companies have been reluctant to take on model-driven approaches to

development. They do not

want to develop or maintain their own tools or to rely on small software

companies,

who may go out of business, for tool development. Without these

specialist tools,

model-based development requires additional manual coding which

reduces the

cost-effectiveness of this approach.

I believe that there are several other reasons why MDA has not become a

main-

stream approach to software development.

1. Models are a good way of facilitating discussions about a software

design.

However, it does not always follow that the abstractions that are useful for

dis-

cussions are the right abstractions for implementation. You may decide to

use a

completely different implementation approach that is based on the reuse

of off-

the-shelf application systems.

2. For most complex systems, implementation is not the major problem—

requirements engineering, security and dependability, integration with

legacy

162 Chapter 5 System modeling

Executable UML

The fundamental notion behind model-driven engineering is that

completely automated transformation of models to code should be

possible. To achieve this, you have to be able to construct graphical

models with clearly defined meanings that can be compiled to executable

code. You also need a way of adding information to graphical models

about the ways in which the operations defined in the model are

implemented. This is possible using a subset of UML 2, called Executable

UML or xUML (Mellor and Balcer 2002).

http://software-engineering-book.com/web/xuml/

systems and testing are all more significant. Consequently, the gains from

the

use of MDA are limited.

3. The arguments for platform independence are only valid for large, long-

lifetime

systems, where the platforms become obsolete during a system’s lifetime.

For

software products and information systems that are developed for

standard plat-

forms, such as Windows and Linux, the savings from the use of MDA are

likely

to be outweighed by the costs of its introduction and tooling.

4. The widespread adoption of agile methods over the same period that

MDA was

evolving has diverted attention away from model-driven approaches.

The success stories for MDA (OMG 2012) have mostly come from

companies

that are developing systems products, which include both hardware and

software.

The software in these products has a long lifetime and may have to be

modified

to reflect changing hardware technologies. The domain of application

(automo-

tive, air traffic control, etc.) is often well understood and so can be

formalized in a CIM.

Hutchinson and his colleagues (Hutchinson, Rouncefield, and Whittle

2012)

report on the industrial use of MDA, and their work confirms that

successes in the

use of model-driven development have been in systems products. Their

assessment

suggests that companies have had mixed results when adopting this

approach, but

the majority of users report that using MDA has increased productivity

and reduced

maintenance costs. They found that MDA was particularly useful in

facilitating

reuse, and this led to major productivity improvements.

There is an uneasy relationship between agile methods and model-driven

archi-

tecture. The notion of extensive up-front modeling contradicts the

fundamental ideas in the agile manifesto and I suspect that few agile

developers feel comfortable with model-driven engineering. Ambler, a

pioneer in the development of agile methods,

suggests that some aspects of MDA can be used in agile processes (Ambler

2004)

but considers automated code generation to be impractical. However,

Zhang and

Patel report on Motorola’s success in using agile development with

automated code

generation (Zhang and Patel 2011).

Chapter 5 Further reading 163

K e y P o i n t s

A model is an abstract view of a system that deliberately ignores some

system details. Complementary system models can be developed to show

the system’s context, interactions, structure, and behavior.

Context models show how a system that is being modeled is positioned

in an environment with other systems and processes. They help define the

boundaries of the system to be developed.

Use case diagrams and sequence diagrams are used to describe the

interactions between users and systems in the system being designed. Use

cases describe interactions between a system and external actors; sequence

diagrams add more information to these by showing interactions between

system objects.

Structural models show the organization and architecture of a system.

Class diagrams are used to define the static structure of classes in a system

and their associations.

Behavioral models are used to describe the dynamic behavior of an

executing system. This behavior can be modeled from the perspective of

the data processed by the system or by the events that stimulate responses

from a system.

Activity diagrams may be used to model the processing of data, where

each activity represents one process step.

State diagrams are used to model a system’s behavior in response to

internal or external events.

Model-driven engineering is an approach to software development in

which a system is represented as a set of models that can be automatically

transformed to executable code.

F u r t h e r r e a d I n g

Any of the introductory books on the UML provide more information

about the notation than I can cover here. UML has only changed slightly

in the last few years, so although some of these books are almost 10 years

old, they are still relevant.

Using UML: Software Engineering with Objects and Components, 2nd ed. This

book is a short, readable introduction to the use of the UML in system

specification and design. I think that it is excellent for learning and

understanding the UML notation, although it is less comprehensive than

the complete descriptions of UML found in the UML reference manual. (P.

Stevens with R. Pooley, Addison-Wesley, 2006)

Model-driven Software Engineering in Practice. This is quite a comprehensive

book on model-driven approaches with a focus on model-driven design

and implementation. As well as the UML, it also covers the development

of domain-specific modeling languages. (M. Brambilla, J. Cabot, and M.

Wimmer. Morgan Claypool, 2012)

164 Chapter 5 System modeling

W e b s i t e

PowerPoint slides for this chapter:

www.pearsonglobaleditions.com/Sommerville

Links to supporting videos:

http://software-engineering-book.com/videos/requirements-and-design/

e x e r c i s e s

5.1. Scope creep can be defined as a continuous increase in the scope of

a project that can significantly increase project cost. Explain how a proper

model of the system context can help prevent scope creeps.

5.2. The way in which a system boundary is defined and an appropriate

context model is created may have serious implications on the complexity

and cost of a project. Give two examples where this may be applicable.

5.3. You have been asked to develop a system that will help with

planning large-scale events and parties such as weddings, graduation

celebrations, and birthday parties. Using an activity diagram, model the

process context for such a system that shows the activities involved in

planning a party (booking a venue, organizing invitations, etc.) and the

system elements that might be used at each stage.

5.4. For the Mentcare system, propose a set of use cases that illustrates

the interactions between a doctor, who sees patients and prescribes

medicine and treatments, and the Mentcare system.

5.5. Develop a sequence diagram showing the interactions involved

when a student registers for a course in a university. Courses may have

limited enrollment, so the registration process must include checks that

places are available. Assume that the student accesses an electronic course

catalog to find out about available courses.

5.6. Look carefully at how messages and mailboxes are represented in

the email system that you use. Model the object classes that might be used

in the system implementation to represent a mailbox and an email

message.

5.7. Based on your experience with a bank ATM, draw an activity

diagram that models the data processing involved when a customer

withdraws cash from the machine.

5.8. Draw a sequence diagram for the same system. Explain why you

might want to develop both activity and sequence diagrams when

modeling the behavior of a system.

5.9. Draw state diagrams of the control software for:

an automatic washing machine that has different programs for different

types of clothes;

the software for a DVD player;

the control software for the camera on your mobile phone. Ignore the

flash if you have one on your phone.

Chapter 5 References 165

5.10. In principle, it is possible to generate working programs from a

high-level model without manual intervention when using model-driven

architectures. Discuss some of the current challenges that stand in the way

of the existence of completely automated translation tools.

r e F e r e n C e S

Ambler, S. W. 2004. The Object Primer: Agile Model-Driven Development with

UML 2.0, 3rd ed.

Cambridge, UK: Cambridge University Press.

Ambler, S. W., and R. Jeffries. 2002. Agile Modeling: Effective Practices for

Extreme Programming and the Unified Process. New York: John Wiley &

Sons.

Booch, G., J. Rumbaugh, and I. Jacobson. 2005. The Unified Modeling

Language User Guide, 2nd ed.

Boston: Addison-Wesley.

Brambilla, M., J. Cabot, and M. Wimmer. 2012. Model-Driven Software

Engineering in Practice. San Rafael, CA: Morgan Claypool.

Den Haan, J. 2011. “Why There Is No Future for Model Driven

Development.” http://www.

theenterprisearchitect.eu/archive/2011/01/25/why-there-is-no-future-for-

model-driven-

development/

Erickson, J,, and K Siau. 2007. “Theoretical and Practical Complexity of

Modeling Methods.”

Comm. ACM 50 (8): 46–51. doi:10.1145/1278201.1278205.

Harel, D. 1987. “Statecharts: A Visual Formalism for Complex Systems.”

Sci. Comput. Programming 8 (3): 231–274.

doi:10.1016/0167-6423(87)90035-9.

Hull, R., and R King. 1987. “Semantic Database Modeling: Survey,

Applications and Research Issues.” ACM Computing Surveys 19 (3): 201–

260. doi:10.1145/45072.45073.

Hutchinson, J., M. Rouncefield, and J. Whittle. 2012. “Model-Driven

Engineering Practices in Industry.” In 34th Int. Conf. on Software

Engineering, 633–642. doi:10.1145/1985793.1985882.

Jacobsen, I., M. Christerson, P. Jonsson, and G. Overgaard. 1993. Object-

Oriented Software Engineering. Wokingham, UK: Addison-Wesley.

Koegel, M. 2012. “EMF Tutorial: What Every Eclipse Developer Should

Know about EMF.” http://

eclipsesource.com/blogs/tutorials/emf-tutorial/

Mellor, S. J., and M. J. Balcer. 2002. Executable UML. Boston: Addison-

Wesley.

Mellor, S. J., K. Scott, and D. Weise. 2004. MDA Distilled: Principles of

Model-Driven Architecture.

Boston: Addison-Wesley.

OMG. 2012. “Model-Driven Architecture: Success Stories.” http://

www.omg.org/mda/products_

success.htm

166 Chapter 5 System modeling

Rumbaugh, J., I. Jacobson, and G Booch. 2004. The Unified Modelling

Language Reference Manual, 2nd ed. Boston: Addison-Wesley.

Stahl, T., and M. Voelter. 2006. Model-Driven Software Development:

Technology, Engineering, Management. New York: John Wiley & Sons.

Zhang, Y., and S. Patel. 2011. “Agile Model-Driven Development in

Practice.” IEEE Software 28 (2): 84–91. doi:10.1109/MS.2010.85.

6

Architectural design

Objectives

The objective of this chapter is to introduce the concepts of software

architecture and architectural design. When you have read the chapter,

you will:

understand why the architectural design of software is important;

understand the decisions that have to be made about the software

architecture during the architectural design process;

have been introduced to the idea of Architectural patterns, well-tried

ways of organizing software architectures that can be reused in

system designs;

understand how Application-Specific Architectural patterns may be

used in transaction processing and language processing systems.

Contents

6.1 Architectural design decisions

6.2 Architectural views

6.3 Architectural patterns

6.4 Application architectures

168 Chapter 6 Architectural design

Architectural design is concerned with understanding how a software

system should

be organized and designing the overall structure of that system. In the

model of the software development process that I described in Chapter 2,

architectural design is

the first stage in the software design process. It is the critical link between

design and requirements engineering, as it identifies the main structural

components in a system and the relationships between them. The output

of the architectural design process is an architectural model that describes

how the system is organized as a set of

communicating components.

In agile processes, it is generally accepted that an early stage of an agile

development process should focus on designing an overall system

architecture. Incremental

development of architectures is not usually successful. Refactoring

components in

response to changes is usually relatively easy. However, refactoring the

system

architecture is expensive because you may need to modify most system

components

to adapt them to the architectural changes.

To help you understand what I mean by system architecture, look at

Figure 6.1.

This diagram shows an abstract model of the architecture for a packing

robot system.

This robotic system can pack different kinds of objects. It uses a vision

component

to pick out objects on a conveyor, identify the type of object, and select

the right kind of packaging. The system then moves objects from the

delivery conveyor to be

packaged. It places packaged objects on another conveyor. The

architectural model

shows these components and the links between them.

In practice, there is a significant overlap between the processes of

requirements

engineering and architectural design. Ideally, a system specification

should not

Vision

system

Object

Arm

Gripper

identification

controller

controller

system

Packaging

selection

system

Packing

Conveyor

Figure 6.1 The

system

controller

architecture of a packing

robot control system

Chapter 6 Architectural design 169

include any design information. This ideal is unrealistic, however, except

for very

small systems. You need to identify the main architectural components as

these

reflect the high-level features of the system. Therefore, as part of the

requirements engineering process, you might propose an abstract system

architecture where you

associate groups of system functions or features with large-scale

components or sub-

systems. You then use this decomposition to discuss the requirements and

more

detailed features of the system with stakeholders.

You can design software architectures at two levels of abstraction, which I

call

architecture in the small and architecture in the large:

1. Architecture in the small is concerned with the architecture of individual

programs. At this level, we are concerned with the way that an individual

pro-

gram is decomposed into components. This chapter is mostly concerned

with

program architectures.

2. Architecture in the large is concerned with the architecture of complex

enterprise systems that include other systems, programs, and program

components.

These enterprise systems may be distributed over different computers,

which

may be owned and managed by different companies. (I cover architecture

in the

large in Chapters 17 and 18.)

Software architecture is important because it affects the performance,

robust-

ness, distributability, and maintainability of a system (Bosch 2000). As

Bosch

explains, individual components implement the functional system

requirements,

but the dominant influence on the non-functional system characteristics is

the

system’s architecture. Chen et al. (Chen, Ali Babar, and Nuseibeh 2013)

con-

firmed this in a study of “architecturally significant requirements” where

they

found that non-functional requirements had the most significant effect on

the

system’s architecture.

Bass et al. (Bass, Clements, and Kazman 2012) suggest that explicitly

designing

and documenting software architecture has three advantages:

1. Stakeholder communication The architecture is a high-level presentation

of the system that may be used as a focus for discussion by a range of

different stakeholders.

2. System analysis Making the system architecture explicit at an early stage

in the system development requires some analysis. Architectural design

decisions

have a profound effect on whether or not the system can meet critical

require-

ments such as performance, reliability, and maintainability.

3. Large-scale reuse An architectural model is a compact, manageable

description of how a system is organized and how the components

interoperate. The system

architecture is often the same for systems with similar requirements and

so can

support large-scale software reuse. As I explain in Chapter 15, product-line

architectures are an approach to reuse where the same architecture is

reused

across a range of related systems.

170 Chapter 6 Architectural design

System architectures are often modeled informally using simple block

diagrams,

as in Figure 6.1. Each box in the diagram represents a component. Boxes

within

boxes indicate that the component has been decomposed to

subcomponents. Arrows

mean that data and or control signals are passed from component to

component in

the direction of the arrows. You can see many examples of this type of

architectural model in Booch’s handbook of software architecture (Booch

2014).

Block diagrams present a high-level picture of the system structure, which

people

from different disciplines, who are involved in the system development

process, can

readily understand. In spite of their widespread use, Bass et al. (Bass,

Clements, and Kazman 2012) dislike informal block diagrams for

describing an architecture. They

claim that these informal diagrams are poor architectural representations,

as they

show neither the type of the relationships among system components nor

the compo-

nents’ externally visible properties.

The apparent contradictions between architectural theory and industrial

prac-

tice arise because there are two ways in which an architectural model of a

program

is used:

1. As a way of encouraging discussions about the system design A high-level

architectural view of a system is useful for communication with system

stakeholders and project planning because it is not cluttered with detail.

Stakeholders can relate to it and understand an abstract view of the

system.

They can then discuss the system as a whole without being confused by

detail.

The architectural model identifies the key components that are to be

devel-

oped so that managers can start assigning people to plan the development

of

these systems.

2. As a way of documenting an architecture that has been designed The aim

here is to produce a complete system model that shows the different

components in a

system, their interfaces and their connections. The argument for such a

model is

that such a detailed architectural description makes it easier to understand

and

evolve the system.

Block diagrams are a good way of supporting communications between

the peo-

ple involved in the software design process. They are intuitive, and

domain experts

and software engineers can relate to them and participate in discussions

about the

system. Managers find them helpful in planning the project. For many

projects,

block diagrams are the only architectural description.

Ideally, if the architecture of a system is to be documented in detail, it is

better to use a more rigorous notation for architectural description.

Various architectural

description languages (Bass, Clements, and Kazman 2012) have been

developed for

this purpose. A more detailed and complete description means that there

is less scope for misunderstanding the relationships between the

architectural components.

However, developing a detailed architectural description is an expensive

and

time-consuming process. It is practically impossible to know whether or

not it is

cost-effective, so this approach is not widely used.

6.1 Architectural design decisions 171

Is there a generic application

architecture that can act as a

How will the system be

What Architectural patterns or

template for the system that is

distributed across hardware

styles might be used?

being designed?

cores or processors?

What will be the fundamental

What strategy will be used to

approach used to structure

control the operation of the

?

the system?

components in the system?

How will the structural

What architectural organization

How should the architecture

components in the system be

is best for delivering the

of the system be

decomposed into

non-functional requirements

documented?

sub-components?

of the system?

Figure 6.2 Architectural

design decisions

6.1 Architectural design decisions

Architectural design is a creative process in which you design a system

organization that will satisfy the functional and non-functional

requirements of a system. There is no formulaic architectural design

process. It depends on the type of system being

developed, the background and experience of the system architect, and the

specific

requirements for the system. Consequently, I think it is best to consider

architectural design as a series of decisions to be made rather than a

sequence of activities.

During the architectural design process, system architects have to make a

number

of structural decisions that profoundly affect the system and its

development pro-

cess. Based on their knowledge and experience, they have to consider the

fundamen-

tal questions shown in Figure 6.2.

Although each software system is unique, systems in the same application

domain

often have similar architectures that reflect the fundamental concepts of

the domain. For example, application product lines are applications that

are built around a core architecture with variants that satisfy specific

customer requirements. When designing a system architecture, you have

to decide what your system and broader application classes have in

common, and decide how much knowledge from these application

architectures you can reuse.

For embedded systems and apps designed for personal computers and

mobile

devices, you do not have to design a distributed architecture for the

system. However, most large systems are distributed systems in which the

system software is distributed across many different computers. The

choice of distribution architecture is a

172 Chapter 6 Architectural design

key decision that affects the performance and reliability of the system.

This is a

major topic in its own right that I cover in Chapter 17.

The architecture of a software system may be based on a particular

Architectural

pattern or style (these terms have come to mean the same thing). An

Architectural

pattern is a description of a system organization (Garlan and Shaw 1993),

such as a

client–server organization or a layered architecture. Architectural patterns

capture the essence of an architecture that has been used in different

software systems. You should be aware of common patterns, where they

can be used, and their strengths

and weaknesses when making decisions about the architecture of a

system. I cover

several frequently used patterns in Section 6.3.

Garlan and Shaw’s notion of an architectural style covers questions 4 to 6

in the

list of fundamental architectural questions shown in Figure 6.2. You have

to choose

the most appropriate structure, such as client–server or layered

structuring, that will enable you to meet the system requirements. To

decompose structural system units,

you decide on a strategy for decomposing components into

subcomponents. Finally,

in the control modeling process, you develop a general model of the

control relationships between the various parts of the system and make

decisions about how the

execution of components is controlled.

Because of the close relationship between non-functional system

characteristics

and software architecture, the choice of architectural style and structure

should

depend on the non-functional requirements of the system:

1. Performance If performance is a critical requirement, the architecture

should be designed to localize critical operations within a small number of

components,

with these components deployed on the same computer rather than

distributed

across the network. This may mean using a few relatively large

components

rather than small, finer-grain components. Using large components

reduces the

number of component communications, as most of the interactions

between

related system features take place within a component. You may also

consider

runtime system organizations that allow the system to be replicated and

exe-

cuted on different processors.

2. Security If security is a critical requirement, a layered structure for the

architecture should be used, with the most critical assets protected in the

innermost lay-

ers and a high level of security validation applied to these layers.

3. Safety If safety is a critical requirement, the architecture should be

designed so that safety-related operations are co-located in a single

component or in a small

number of components. This reduces the costs and problems of safety

validation

and may make it possible to provide related protection systems that can

safely

shut down the system in the event of failure.

4. Availability If availability is a critical requirement, the architecture

should be designed to include redundant components so that it is possible

to replace and

update components without stopping the system. I describe fault-tolerant

sys-

tem architectures for high-availability systems in Chapter 11.

6.2 Architectural views 173

5. Maintainability If maintainability is a critical requirement, the system

architecture should be designed using fine-grain, self-contained

components that may

readily be changed. Producers of data should be separated from

consumers, and

shared data structures should be avoided.

Obviously, there is potential conflict between some of these architectures.

For

example, using large components improves performance, and using small,

fine-grain

components improves maintainability. If both performance and

maintainability are

important system requirements, however, then some compromise must be

found.

You can sometimes do this by using different Architectural patterns or

styles for

separate parts of the system. Security is now almost always a critical

requirement,

and you have to design an architecture that maintains security while also

satisfying other non-functional requirements.

Evaluating an architectural design is difficult because the true test of an

architecture is how well the system meets its functional and non-

functional requirements

when it is in use. However, you can do some evaluation by comparing

your design

against reference architectures or generic Architectural patterns. Bosch’s

description (Bosch 2000) of the non-functional characteristics of some

Architectural patterns can help with architectural evaluation.

6.2 Architectural views

I explained in the introduction to this chapter that architectural models of

a software system can be used to focus discussion about the software

requirements or design.

Alternatively, they may be used to document a design so that it can be

used as a basis for more detailed design and implementation of the system.

In this section, I discuss two issues that are relevant to both of these:

1. What views or perspectives are useful when designing and documenting

a sys-

tem’s architecture?

2. What notations should be used for describing architectural models?

It is impossible to represent all relevant information about a system’s

architecture in a single diagram, as a graphical model can only show one

view or perspective of

the system. It might show how a system is decomposed into modules, how

the

runtime processes interact, or the different ways in which system

components are

distributed across a network. Because all of these are useful at different

times, for both design and documentation, you usually need to present

multiple views of the

software architecture.

There are different opinions as to what views are required. Krutchen

(Krutchen 1995) in his well-known 4 +1 view model of software

architecture, suggests that there should

174 Chapter 6 Architectural design

Logical

Physical

view

view

System

architecture

Development

Process

view

view

Figure 6.3 Architectural

views

be four fundamental architectural views, which can be linked through

common use

cases or scenarios (Figure 6.3). He suggests the following views:

1. A logical view, which shows the key abstractions in the system as objects

or object classes. It should be possible to relate the system requirements to

entities

in this logical view.

2. A process view, which shows how, at runtime, the system is composed of

interacting processes. This view is useful for making judgments about non-

func-

tional system characteristics such as performance and availability.

3. A development view, which shows how the software is decomposed for

development; that is, it shows the breakdown of the software into

components that are

implemented by a single developer or development team. This view is

useful for

software managers and programmers.

4. A physical view, which shows the system hardware and how software

compo-

nents are distributed across the processors in the system. This view is

useful for

systems engineers planning a system deployment.

Hofmeister et al. (Hofmeister, Nord, and Soni 2000) suggest the use of

similar views but add to this the notion of a conceptual view. This view is

an abstract view of the system that can be the basis for decomposing high-

level requirements into more detailed specifications, help engineers make

decisions about components that can be reused, and represent a product

line (discussed in Chapter 15) rather than a single system. Figure 6.1,

which describes the architecture of a packing robot, is an example of a

conceptual system view.

In practice, conceptual views of a system’s architecture are almost always

devel-

oped during the design process. They are used to explain the system

architecture to

stakeholders and to inform architectural decision making. During the

design process, some of the other views may also be developed when

different aspects of the system

are discussed, but it is rarely necessary to develop a complete description

from all perspectives. It may also be possible to associate Architectural

patterns, discussed in the next section, with the different views of a

system.

6.3 Architectural patterns 175

There are differing views about whether or not software architects should

use the

UML for describing and documenting software architectures. A survey in

2006 (Lange,

Chaudron, and Muskens 2006) showed that, when the UML was used, it

was mostly

applied in an informal way. The authors of that paper argued that this was

a bad thing.

I disagree with this view. The UML was designed for describing object-

oriented

systems, and, at the architectural design stage, you often want to describe

systems at a higher level of abstraction. Object classes are too close to the

implementation to be useful for architectural description. I don’t find the

UML to be useful during the design process itself and prefer informal

notations that are quicker to write and that can be easily drawn on a

whiteboard. The UML is of most value when you are documenting an

architecture in detail or using model-driven development, as discussed in

Chapter 5.

A number of researchers (Bass, Clements, and Kazman 2012) have

proposed the

use of more specialized architectural description languages (ADLs) to

describe system architectures. The basic elements of ADLs are components

and connectors, and they

include rules and guidelines for well-formed architectures. However,

because ADLs

are specialist languages, domain and application specialists find it hard to

understand and use ADLs. There may be some value in using domain-

specific ADLs as part of

model-driven development, but I do not think they will become part of

mainstream

software engineering practice. Informal models and notations, such as the

UML, will

remain the most commonly used ways of documenting system

architectures.

Users of agile methods claim that detailed design documentation is mostly

unused. It is, therefore, a waste of time and money to develop these

documents. I

largely agree with this view, and I think that, except for critical systems, it

is not worth developing a detailed architectural description from

Krutchen’s four perspectives. You should develop the views that are useful

for communication and not worry

about whether or not your architectural documentation is complete.

6.3 Architectural patterns

The idea of patterns as a way of presenting, sharing, and reusing

knowledge about

software systems has been adopted in a number of areas of software

engineering. The

trigger for this was the publication of a book on object-oriented design

patterns

(Gamma et al. 1995). This prompted the development of other types of

patterns, such

as patterns for organizational design (Coplien and Harrison 2004),

usability patterns (Usability Group 1998), patterns of cooperative

interaction (Martin and Sommerville

2004), and configuration management patterns (Berczuk and Appleton

2002).

Architectural patterns were proposed in the 1990s under the name

“architectural

styles” (Shaw and Garlan 1996). A very detailed five-volume series of

handbooks on

pattern-oriented software architecture was published between 1996 and

2007

(Buschmann et al. 1996; Schmidt et al. 2000; Buschmann, Henney, and

Schmidt

2007a, 2007b; Kircher and Jain 2004).

In this section, I introduce Architectural patterns and briefly describe a

selection of Architectural patterns that are commonly used. Patterns may

be described in a standard way (Figures 6.4 and 6.5) using a mixture of

narrative description and diagrams.

176 Chapter 6 Architectural design

Name

MVC (Model-View-Controller)

Description

Separates presentation and interaction from the system data. The

system is structured into three logical components that interact with

each other. The Model component manages the system data and

associated operations on that data. The View component defines and

manages how the data is presented to the user. The Controller compo-

nent manages user interaction (e.g., key presses, mouse clicks, etc.) and

passes these interactions to the View and the Model. See Figure 6.5.

Example

Figure 6.6 shows the architecture of a web-based application system

organized using the MVC pattern.

When used

Used when there are multiple ways to view and interact with data.

Also used when the future requirements for interaction and

presentation of data are unknown.

Advantages

Allows the data to change independently of its representation and vice

versa. Supports presentation of the same data in different ways, with

changes made in one representation shown in all of them.

Figure 6.4 The

Disadvantages

May involve additional code and code complexity when the data

Model-View-Controller

model and interactions are simple.

(MVC) pattern

For more detailed information about patterns and their use, you should

refer to the

published pattern handbooks.

You can think of an Architectural pattern as a stylized, abstract

description of good practice, which has been tried and tested in different

systems and environments. So, an Architectural pattern should describe a

system organization that has been successful in previous systems. It should

include information on when it is and is not appropriate to use that

pattern, and details on the pattern’s strengths and weaknesses.

Figure 6.4 describes the well-known Model-View-Controller pattern. This

pattern

is the basis of interaction management in many web-based systems and is

supported

by most language frameworks. The stylized pattern description includes

the pattern

Controller

View

View

selection

Maps user actions

Renders model

to model updates

Requests model updates

Selects view

Sends user events to

User events controller

Change

notification

State

change

State query

Model

Encapsulates application

state

Figure 6.5 The

Notifies view of state

organization of the

changes

Model-View-Controller

6.3 Architectural patterns 177

Browser

Controller

View

Form to

display

HTTP request processing

Dynamic page

Application-specific logic

generation

Data validation

Forms management

User events

Change

notification

Update

Refresh request

request

Model

Business logic

Figure 6.6 Web

Database

application architecture

using the MVC pattern

name, a brief description, a graphical model, and an example of the type

of system

where the pattern is used. You should also include information about

when the

pattern should be used and its advantages and disadvantages.

Graphical models of the architecture associated with the MVC pattern are

shown

in Figures 6.5 and 6.6. These present the architecture from different views:

Figure 6.5

is a conceptual view, and Figure 6.6 shows a runtime system architecture

when this

pattern is used for interaction management in a web-based system.

In this short space, it is impossible to describe all of the generic patterns

that

can be used in software development. Instead, I present some selected

examples of

patterns that are widely used and that capture good architectural design

principles.

6.3.1 Layered architecture

The notions of separation and independence are fundamental to

architectural design

because they allow changes to be localized. The MVC pattern, shown in

Figure 6.4,

separates elements of a system, allowing them to change independently.

For example,

adding a new view or changing an existing view can be done without any

changes to

the underlying data in the model. The Layered Architecture pattern is

another way of achieving separation and independence. This pattern is

shown in Figure 6.7. Here, the system functionality is organized into

separate layers, and each layer only relies on the facilities and services

offered by the layer immediately beneath it.

This layered approach supports the incremental development of systems.

As a

layer is developed, some of the services provided by that layer may be

made availa-

ble to users. The architecture is also changeable and portable. If its

interface is

unchanged, a new layer with extended functionality can replace an

existing layer

178 Chapter 6 Architectural design

Name

Layered architecture

Description

Organizes the system into layers, with related functionality associated

with each layer. A layer provides services to the layer above it, so the

lowest level layers represent core services that are likely to be used

throughout the system. See Figure 6.8.

Example

A layered model of a digital learning system to support learning of all

subjects in schools (Figure 6.9).

When used

Used when building new facilities on top of existing systems; when the

development is spread across several teams with each team responsibility

for a layer of functionality; when there is a requirement for multilevel

security.

Advantages

Allows replacement of entire layers as long as the interface is maintained.

Redundant facilities (e.g., authentication) can be provided in each layer to

increase the dependability of the system.

Disadvantages

In practice, providing a clean separation between layers is often difficult,

and a high-level layer may have to interact directly with lower-level layers

rather than through the layer immediately below it. Performance can be a

problem because of multiple levels of interpretation of a service request as

it is processed at each layer.

Figure 6.7 The

without changing other parts of the system. Furthermore, when layer

interfaces

Layered Architecture

pattern

change or new facilities are added to a layer, only the adjacent layer is

affected. As layered systems localize machine dependencies, this makes it

easier to provide

multi-platform implementations of an application system. Only the

machine-

dependent layers need be reimplemented to take account of the facilities

of a different operating system or database.

Figure 6.8 is an example of a layered architecture with four layers. The

lowest

layer includes system support software—typically, database and operating

system

support. The next layer is the application layer, which includes the

components

concerned with the application functionality and utility components used

by other

application components.

The third layer is concerned with user interface management and

providing user

authentication and authorization, with the top layer providing user

interface facilities. Of course, the number of layers is arbitrary. Any of the

layers in Figure 6.6

could be split into two or more layers.

User interface

User interface management

Authentication and authorization

Core business logic/application functionality

System utilities

System support (OS, database, etc.)

Figure 6.8 A generic

layered architecture

6.3 Architectural patterns 179

Browser-based user interface

iLearn app

Configuration services

Group

Application

Identity

management

management

management

Application services

Email Messaging Video conferencing Newspaper archive

Word processing Simulation Video storage Resource finder

Spreadsheet Virtual learning environment History archive

Utility services

Authentication Logging and monitoring Interfacing

Figure 6.9 The

User storage

Application storage

Search

architecture of the

iLearn system

Figure 6.9 shows that the iLearn digital learning system, introduced in

Chapter 1,

has a four-layer architecture that follows this pattern. You can see another

example of the Layered Architecture pattern in Figure 6.19 (Section 6.4,

which shows the

organization of the Mentcare system.

6.3.2 Repository architecture

The layered architecture and MVC patterns are examples of patterns

where the view

presented is the conceptual organization of a system. My next example,

the Repository Figure 6.10 The

pattern (Figure 6.10), describes how a set of interacting components can

share data.

Repository pattern

Name

Repository

Description

All data in a system is managed in a central repository that is accessible to

all system components. Components do not interact directly, only through

the repository.

Example

Figure 6.11 is an example of an IDE where the components use a

repository of system design infor mation. Each software tool generates

information, which is then available for use by other tools.

When used

You should use this pattern when you have a system in which large

volumes of information are generated that has to be stored for a long time.

You may also use it in data-driven systems where the inclusion of data in

the repository triggers an action or tool.

Advantages

Components can be independent; they do not need to know of the

existence of other

components. Changes made by one component can be propagated to all

components. All data can be managed consistently (e.g., backups done at

the same time) as it is all in one place.

Disadvantages

The repository is a single point of failure so problems in the repository

affect the whole system. May be inefficiencies in organizing all

communication through the repository.

Distributing the repository across several computers may be difficult.

180 Chapter 6 Architectural design

UML

Code

editors

generators

Java

editor

Design

Project

translator

repository

Python

editor

Design

Report

Figure 6.11 A repository

analyzer

generator

architecture for an IDE

The majority of systems that use large amounts of data are organized

around a shared database or repository. This model is therefore suited to

applications in which data is generated by one component and used by

another. Examples of this type of system

include command and control systems, management information systems,

Computer-

Aided Design (CAD) systems, and interactive development environments

for software.

Figure 6.11 illustrates a situation in which a repository might be used.

This diagram shows an IDE that includes different tools to support model-

driven development. The

repository in this case might be a version-controlled environment (as

discussed in

Chapter 25) that keeps track of changes to software and allows rollback to

earlier versions.

Organizing tools around a repository is an efficient way of sharing large

amounts

of data. There is no need to transmit data explicitly from one component

to another.

However, components must operate around an agreed repository data

model.

Inevitably, this is a compromise between the specific needs of each tool,

and it may be difficult or impossible to integrate new components if their

data models do not fit the agreed schema. In practice, it may be difficult

to distribute the repository over a number of machines. Although it is

possible to distribute a logically centralized

repository, this involves maintaining multiple copies of data. Keeping

these consistent and up to date adds more overhead to the system.

In the repository architecture shown in Figure 6.11, the repository is

passive and

control is the responsibility of the components using the repository. An

alternative approach, which has been derived for artificial intelligence

(AI) systems, uses a

“blackboard” model that triggers components when particular data

become availa-

ble. This is appropriate when the data in the repository is unstructured.

Decisions

about which tool is to be activated can only be made when the data has

been ana-

lyzed. This model was introduced by Nii (Nii 1986), and Bosch (Bosch

2000)

includes a good discussion of how this style relates to system quality

attributes.

6.3.3 Client–server architecture

The Repository pattern is concerned with the static structure of a system

and does

not show its runtime organization. My next example, the Client–Server

pattern

(Figure 6.12), illustrates a commonly used runtime organization for

distributed

6.3 Architectural patterns 181

Name

Client–server

Description

In a client–server architecture, the system is presented as a set of services,

with each service delivered by a separate server. Clients are users of these

services and access servers to make use of them.

Example

Figure 6.13 is an example of a film and video/DVD library organized as a

client–server system.

When used

Used when data in a shared database has to be accessed from a range of

locations. Because servers can be replicated, may also be used when the

load on a system is variable.

Advantages

The principal advantage of this model is that servers can be distributed

across a network.

General functionality (e.g., a printing service) can be available to all

clients and does not need to be implemented by all services.

Disadvantages

Each service is a single point of failure and so is susceptible to denial-of-

service attacks or server failure. Performance may be unpredictable

because it depends on the network as well as the system. Management

problems may arise if servers are owned by

different organizations.

Figure 6.12 The

Client–Server pattern

systems. A system that follows the Client–Server pattern is organized as a

set of services and associated servers, and clients that access and use the

services. The major components of this model are:

1. A set of servers that offer services to other components. Examples of

servers

include print servers that offer printing services, file servers that offer file

management services, and a compile server that offers programming

language com-

pilation services. Servers are software components, and several servers

may run

on the same computer.

2. A set of clients that call on the services offered by servers. There will

normally be several instances of a client program executing concurrently

on different

computers.

3. A network that allows the clients to access these services. Client–server

sys-

tems are usually implemented as distributed systems, connected using

Internet

protocols.

Client–server architectures are usually thought of as distributed systems

architec-

tures, but the logical model of independent services running on separate

servers can be implemented on a single computer. Again, an important

benefit is separation and

independence. Services and servers can be changed without affecting

other parts of

the system.

Clients may have to know the names of the available servers and the

services

they provide. However, servers do not need to know the identity of clients

or how

many clients are accessing their services. Clients access the services

provided by a server through remote procedure calls using a request–reply

protocol (such as http), where a client makes a request to a server and

waits until it receives a reply from

that server.

182 Chapter 6 Architectural design

Client 1

Client 2

Client 3

Client 4

Internet

Catalog

Video

Picture

Web

server

server

server

server

Figure 6.13 A client–

Library

Film store

Photo store

Film and

server architecture for a

catalogue

photo info.

film library

Figure 6.13 is an example of a system that is based on the client–server

model.

This is a multiuser, web-based system for providing a film and photograph

library.

In this system, several servers manage and display the different types of

media.

Video frames need to be transmitted quickly and in synchrony but at

relatively low

resolution. They may be compressed in a store, so the video server can

handle

video compression and decompression in different formats. Still pictures,

how-

ever, must be maintained at a high resolution, so it is appropriate to

maintain them on a separate server.

The catalog must be able to deal with a variety of queries and provide

links into

the web information system that include data about the film and video

clips, and an

e-commerce system that supports the sale of photographs, film, and video

clips. The

client program is simply an integrated user interface, constructed using a

web

browser, to access these services.

The most important advantage of the client–server model is that it is a

distributed

architecture. Effective use can be made of networked systems with many

distributed

processors. It is easy to add a new server and integrate it with the rest of

the system or to upgrade servers transparently without affecting other

parts of the system. I

cover distributed architectures in Chapter 17, where I explain the client–

server

model and its variants in more detail.

6.3.4 Pipe and filter architecture

My final example of a general Architectural pattern is the Pipe and Filter

pattern

(Figure 6.14). This is a model of the runtime organization of a system

where

functional transformations process their inputs and produce outputs. Data

flows

from one to another and is transformed as it moves through the sequence.

Each

processing step is implemented as a transform. Input data flows through

these

transforms until converted to output. The transformations may execute

sequen-

tially or in parallel. The data can be processed by each transform item by

item or

in a single batch.

6.3 Architectural patterns 183

Name

Pipe and filter

Description

The processing of the data in a system is organized so that each processing

component (filter) is discrete and carries out one type of data

transformation. The data flows (as in a pipe) from one component to

another for processing.

Example

Figure 6.15 is an example of a pipe and filter system used for processing

invoices.

When used

Commonly used in data-processing applications (both batch and

transaction-based)

where inputs are processed in separate stages to generate related outputs.

Advantages

Easy to understand and supports transformation reuse. Workflow style

matches the

structure of many business processes. Evolution by adding transformations

is

straightforward. Can be implemented as either a sequential or concurrent

system.

Disadvantages

The format for data transfer has to be agreed between communicating

transformations.

Each transformation must parse its input and unparse its output to the

agreed form. This increases system overhead and may mean that it is

impossible to reuse architectural

components that use incompatible data structures.

Figure 6.14 The Pipe

and Filter pattern

The name “pipe and filter” comes from the original Unix system where it

was

possible to link processes using “pipes.” These passed a text stream from

one pro-

cess to another. Systems that conform to this model can be implemented

by combin-

ing Unix commands, using pipes and the control facilities of the Unix

shell. The

term filter is used because a transformation “filters out” the data it can

process from its input data stream.

Variants of this pattern have been in use since computers were first used

for auto-

matic data processing. When transformations are sequential with data

processed in

batches, this pipe and filter architectural model becomes a batch

sequential model, a common architecture for data-processing systems such

as billing systems. The architecture of an embedded system may also be

organized as a process pipeline, with

each process executing concurrently. I cover use of this pattern in

embedded systems in Chapter 21.

An example of this type of system architecture, used in a batch processing

appli-

cation, is shown in Figure 6.15. An organization has issued invoices to

customers.

Once a week, payments that have been made are reconciled with the

invoices. For

Figure 6.15 An

example of the pipe

and filter architecture

Issue

Receipts

receipts

Read issued

Identify

invoices

payments

Find

Issue

payments

payment

Reminders

due

reminder

Invoices

Payments

184 Chapter 6 Architectural design

Architectural patterns for control

There are specific Architectural patterns that reflect commonly used ways

of organizing control in a system.

These include centralized control, based on one component calling other

components, and event-based control, where the system reacts to external

events.

http://software-engineering-book.com/web/archpatterns/

those invoices that have been paid, a receipt is issued. For those invoices

that have not been paid within the allowed payment time, a reminder is

issued.

Pipe and filter systems are best suited to batch processing systems and

embedded

systems where there is limited user interaction. Interactive systems are

difficult to write using the pipe and filter model because of the need for a

stream of data to be processed. While simple textual input and output can

be modeled in this way, graphical user interfaces have more complex I/O

formats and a control strategy that is

based on events such as mouse clicks or menu selections. It is difficult to

implement this as a sequential stream that conforms to the pipe and filter

model.

6.4 Application architectures

Application systems are intended to meet a business or an organizational

need. All

businesses have much in common—they need to hire people, issue

invoices, keep

accounts, and so on. Businesses operating in the same sector use common

sector-

specific applications. Therefore, as well as general business functions, all

phone

companies need systems to connect and meter calls, manage their network

and issue

bills to customers. Consequently, the application systems used by these

businesses

also have much in common.

These commonalities have led to the development of software

architectures that

describe the structure and organization of particular types of software

systems.

Application architectures encapsulate the principal characteristics of a

class of systems. For example, in real-time systems, there might be generic

architectural models of different system types, such as data collection

systems or monitoring systems.

Although instances of these systems differ in detail, the common

architectural structure can be reused when developing new systems of the

same type.

The application architecture may be reimplemented when developing new

sys-

tems. However, for many business systems, application architecture reuse

is implicit when generic application systems are configured to create a

new application. We

see this in the widespread use of Enterprise Resource Planning (ERP)

systems and

off-the-shelf configurable application systems, such as systems for

accounting and

stock control. These systems have a standard architecture and

components. The

components are configured and adapted to create a specific business

application.

6.4 Application architectures 185

Application architectures

There are several examples of application architectures on the book’s

website. These include descriptions of batch data-processing systems,

resource allocation systems, and event-based editing systems.

http://software-engineering-book.com/web/apparch/

For example, a system for supply chain management can be adapted for

different

types of suppliers, goods, and contractual arrangements.

As a software designer, you can use models of application architectures in

a num-

ber of ways:

1. As a starting point for the architectural design process If you are unfamiliar

with the type of application that you are developing, you can base your

initial

design on a generic application architecture. You then specialize this for

the

specific system that is being developed.

2. As a design checklist If you have developed an architectural design for an

application system, you can compare this with the generic application

architecture.

You can check that your design is consistent with the generic architecture.

3. As a way of organizing the work of the development team The application

architectures identify stable structural features of the system architectures,

and in

many cases, it is possible to develop these in parallel. You can assign work

to

group members to implement different components within the

architecture.

4. As a means of assessing components for reuse If you have components you

might be able to reuse, you can compare these with the generic structures

to see

whether there are comparable components in the application architecture.

5. As a vocabulary for talking about applications If you are discussing a

specific application or trying to compare applications, then you can use

the concepts

identified in the generic architecture to talk about these applications.

There are many types of application system, and, in some cases, they may

seem to

be very different. However, superficially dissimilar applications may have

much in

common and thus share an abstract application architecture. I illustrate

this by

describing the architectures of two types of application:

1. Transaction processing applications Transaction processing applications

are database-centered applications that process user requests for

information and

update the information in a database. These are the most common types of

inter-

active business systems. They are organized in such a way that user

actions

can’t interfere with each other and the integrity of the database is

maintained.

This class of system includes interactive banking systems, e-commerce

systems,

information systems, and booking systems.

186 Chapter 6 Architectural design

Figure 6.16 The

I/O

Application

Transaction

structure of transaction

processing

logic

manager

Database

processing applications

2. Language processing systems Language processing systems are systems in

which the user’s intentions are expressed in a formal language, such as a

pro-

gramming language. The language processing system processes this

language

into an internal format and then interprets this internal representation.

The best-

known language processing systems are compilers, which translate high-

level

language programs into machine code. However, language processing

systems

are also used to interpret command languages for databases and

information

systems, and markup languages such as XML.

I have chosen these particular types of system because a large number of

web-

based business systems are transaction processing systems, and all

software devel-

opment relies on language processing systems.

6.4.1 Transaction processing systems

Transaction processing systems are designed to process user requests for

information from a database, or requests to update a database (Lewis,

Bernstein, and Kifer 2003).

Technically, a database transaction is part of a sequence of operations and

is treated as a single unit (an atomic unit). All of the operations in a

transaction have to be completed before the database changes are made

permanent. This ensures that failure

of operations within a transaction does not lead to inconsistencies in the

database.

From a user perspective, a transaction is any coherent sequence of

operations that

satisfies a goal, such as “find the times of flights from London to Paris.” If

the user transaction does not require the database to be changed, then it

may not be necessary to package this as a technical database transaction.

An example of a database transaction is a customer request to withdraw

money from a

bank account using an ATM. This involves checking the customer account

balance to see if sufficient funds are available, modifying the balance by

the amount withdrawn and sending commands to the ATM to deliver the

cash. Until all of these steps have been completed, the transaction is

incomplete and the customer accounts database is not changed.

Transaction processing systems are usually interactive systems in which

users

make asynchronous requests for service. Figure 6.16 illustrates the

conceptual architectural structure of transaction processing applications.

First, a user makes a request to the system through an I/O processing

component. The request is processed by

some application-specific logic. A transaction is created and passed to a

transaction manager, which is usually embedded in the database

management system. After the

transaction manager has ensured that the transaction is properly

completed, it signals to the application that processing has finished.

Transaction processing systems may be organized as a “pipe and filter”

architec-

ture, with system components responsible for input, processing, and

output. For

6.4 Application architectures 187

Input

Process

Output

Get customer

Print details

account id

Query account

Validate card

Return card

Update account

Select service

Dispense cash

ATM

Database

ATM

Figure 6.17 The

example, consider a banking system that allows customers to query their

accounts

software architecture

and withdraw cash from an ATM. The system is composed of two

cooperating soft-

of an ATM system

ware components—the ATM software and the account processing software

in the

bank’s database server. The input and output components are

implemented as soft-

ware in the ATM, and the processing component is part of the bank’s

database

server. Figure 6.17 shows the architecture of this system, illustrating the

functions of the input, process, and output components.

6.4.2 Information systems

All systems that involve interaction with a shared database can be

considered to be

transaction-based information systems. An information system allows

controlled

access to a large base of information, such as a library catalog, a flight

timetable, or the records of patients in a hospital. Information systems are

almost always web-based systems, where the user interface is

implemented in a web browser.

Figure 6.18 presents a very general model of an information system. The

system

is modeled using a layered approach (discussed in Section 6.3) where the

top layer

User interface

User communications

Authentication and

authorization

Information retrieval and modification

Transaction management

Figure 6.18 Layered

Database

information system

architecture

188 Chapter 6 Architectural design

Web browser

Login

Form and menu

Data

Role checking

manager

validation

Security

Patient info.

Data import

Report

management

manager

and export

generation

Transaction management

Figure 6.19 The

Patient database

architecture of the

Mentcare system

supports the user interface and the bottom layer is the system database.

The user

communications layer handles all input and output from the user

interface, and the

information retrieval layer includes application-specific logic for accessing

and

updating the database. The layers in this model can map directly onto

servers in a

distributed Internet-based system.

As an example of an instantiation of this layered model, Figure 6.19 shows

the

architecture of the Mentcare system. Recall that this system maintains and

manages

details of patients who are consulting specialist doctors about mental

health prob-

lems. I have added detail to each layer in the model by identifying the

components

that support user communications and information retrieval and access:

1. The top layer is a browser-based user interface.

2. The second layer provides the user interface functionality that is

delivered

through the web browser. It includes components to allow users to log in

to the

system and checking components that ensure that the operations they use

are

allowed by their role. This layer includes form and menu management

compo-

nents that present information to users, and data validation components

that

check information consistency.

3. The third layer implements the functionality of the system and provides

components that implement system security, patient information creation

and

updating, import and export of patient data from other databases, and

report

generators that create management reports.

4. Finally, the lowest layer, which is built using a commercial database

manage-

ment system, provides transaction management and persistent data

storage.

Information and resource management systems are sometimes also

transaction pro-

cessing systems. For example, e-commerce systems are Internet-based

resource

management systems that accept electronic orders for goods or services

and then

arrange delivery of these goods or services to the customer . In an e-

commerce

6.4 Application architectures 189

Translator

Source

language

Check syntax

instructions

Check semantics

Generate

Abstract m/c

instructions

Interpreter

Figure 6.20 The

Data

Fetch

Results

architecture

Execute

of a language

processing system

system, the application-specific layer includes additional functionality

supporting a

“shopping cart” in which users can place a number of items in separate

transactions, then pay for them all together in a single transaction.

The organization of servers in these systems usually reflects the four-layer

generic model presented in Figure 6.18. These systems are often

implemented as distributed

systems with a multitier client server/architecture

1. The web server is responsible for all user communications, with the user

inter-

face implemented using a web browser;

2. The application server is responsible for implementing application-

specific

logic as well as information storage and retrieval requests;

3. The database server moves information to and from the database and

handles

transaction management.

Using multiple servers allows high throughput and makes it possible to

handle thou-

sands of transactions per minute. As demand increases, servers can be

added at each

level to cope with the extra processing involved.

6.4.3 Language processing systems

Language processing systems translate one language into an alternative

representation of that language and, for programming languages, may also

execute the resulting code.

Compilers translate a programming language into machine code. Other

language pro-

cessing systems may translate an XML data description into commands to

query a

database or to an alternative XML representation. Natural language

processing sys-

tems may translate one natural language to another, for example, French

to Norwegian.

A possible architecture for a language processing system for a

programming

language is illustrated in Figure 6.20. The source language instructions

define the

190 Chapter 6 Architectural design

Lexical

Syntax

Semantic

analyzer

analyzer

analyzer

Abstract

Grammar

Formatter

syntax tree

definition

Optimizer

Symbol

Output

Editor

Code

table

definition

generator

Figure 6.21 A

repository architecture

for a language

Repository

processing system

program to be executed, and a translator converts these into instructions

for an abstract machine. These instructions are then interpreted by

another component that fetches

the instructions for execution and executes them using (if necessary) data

from the

environment. The output of the process is the result of interpreting the

instructions on the input data.

For many compilers, the interpreter is the system hardware that processes

machine

instructions, and the abstract machine is a real processor. However, for

dynamically typed languages, such as Ruby or Python, the interpreter is a

software component.

Programming language compilers that are part of a more general program-

ming environment have a generic architecture (Figure 6.21) that includes

the fol-

lowing components:

1. A lexical analyzer, which takes input language tokens and converts

them into an

internal form.

2. A symbol table, which holds information about the names of entities

(variables,

class names, object names, etc.) used in the text that is being translated.

3. A syntax analyzer, which checks the syntax of the language being

translated. It

uses a defined grammar of the language and builds a syntax tree.

4. A syntax tree, which is an internal structure representing the program

being

compiled.

5. A semantic analyzer, which uses information from the syntax tree and

the sym-

bol table to check the semantic correctness of the input language text.

6. A code generator, which “walks” the syntax tree and generates abstract

machine code.

Other components might also be included that analyze and transform the

syntax

tree to improve efficiency and remove redundancy from the generated

machine code.

6.4 Application architectures 191

Reference architectures

Reference architectures capture important features of system architectures

in a domain. Essentially, they include everything that might be in an

application architecture, although, in reality, it is very unlikely that any

individual application would include all the features shown in a reference

architecture. The main purpose of reference architectures is to evaluate

and compare design proposals, and to educate people about architectural

characteristics in that domain.

http://software-engineering-book.com/web/refarch/

In other types of language processing system, such as a natural language

translator, there will be additional components such as a dictionary. The

output of the system is translation of the input text.

Figure 6.21 illustrates how a language processing system can be part of an

inte-

grated set of programming support tools. In this example, the symbol table

and syn-

tax tree act as a central information repository. Tools or tool fragments

communicate through it. Other information that is sometimes embedded

in tools, such as the grammar definition and the definition of the output

format for the program, have been

taken out of the tools and put into the repository. Therefore, a syntax-

directed editor can check that the syntax of a program is correct as it is

being typed. A program

formatter can create listings of the program that highlight different

syntactic ele-

ments and are therefore easier to read and understand.

Alternative Architectural patterns may be used in a language processing

system

(Garlan and Shaw 1993). Compilers can be implemented using a

composite of a

repository and a pipe and filter model. In a compiler architecture, the

symbol table is a repository for shared data. The phases of lexical,

syntactic, and semantic analysis are organized sequentially, as shown in

Figure 6.22, and communicate through the

shared symbol table.

This pipe and filter model of language compilation is effective in batch

environ-

ments where programs are compiled and executed without user

interaction; for

example, in the translation of one XML document to another. It is less

effective when a compiler is integrated with other language processing

tools such as a structured

editing system, an interactive debugger, or a program formatter. In this

situation,

changes from one component need to be reflected immediately in other

components.

It is better to organize the system around a repository, as shown in Figure

6.21 if you are implementing a general, language-oriented programming

environment.

Symbol table

Syntax tree

Figure 6.22 A pipe

Lexical

Syntactic

Semantic

Code

and filter compiler

analysis

analysis

analysis

generation

architecture

192 Chapter 6 Architectural design

K e y P o i n t s

A software architecture is a description of how a software system is

organized. Properties of a system such as performance, security, and

availability are influenced by the architecture used.

Architectural design decisions include decisions on the type of

application, the distribution of the system, the architectural styles to be

used, and the ways in which the architecture should be documented and

evaluated.

Architectures may be documented from several different perspectives or

views. Possible views include a conceptual view, a logical view, a process

view, a development view, and a physical view.

Architectural patterns are a means of reusing knowledge about generic

system architectures.

They describe the architecture, explain when it may be used, and point

out its advantages and disadvantages.

Commonly used Architectural patterns include model-view-controller,

layered architecture, repository, client–server, and pipe and filter.

Generic models of application systems architectures help us understand

the operation of applications, compare applications of the same type,

validate application system designs, and assess large-scale components for

reuse.

Transaction processing systems are interactive systems that allow

information in a database to be remotely accessed and modified by a

number of users. Information systems and resource management systems

are examples of transaction processing systems.

Language processing systems are used to translate texts from one

language into another and to carry out the instructions specified in the

input language. They include a translator and an abstract machine that

executes the generated language.

F u r t h e r r e A d i n g

Software Architecture: Perspectives on an Emerging Discipline. This was the

first book on software architecture and has a good discussion on different

architectural styles that is still relevant.

(M. Shaw and D. Garlan, 1996, Prentice-Hall).

“The Golden Age of Software Architecture.” This paper surveys the

development of software architecture from its beginnings in the 1980s

through to its usage in the 21st century. There is not a lot of technical

content, but it is an interesting historical overview. (M. Shaw and P.

Clements, IEEE

Software, 21 (2), March–April 2006) http://doi.dx.org/10.1109/

MS.2006.58.

Software Architecture in Practice (3rd ed.). This is a practical discussion of

software architectures that does not oversell the benefits of architectural

design. It provides a clear business rationale, explaining why architectures

are important. (L. Bass, P. Clements, and R. Kazman, 2012, Addison-

Wesley).

Chapter 6 Exercises 193

Handbook of Software Architecture. This is a work in progress by Grady

Booch, one of the early evangelists for software architecture. He has been

documenting the architectures of a range of software systems so that you

can see reality rather than academic abstraction. Available on the web and

intended to appear as a book. (G. Booch, 2014) http://

www.handbookofsoftwarearchitecture.com/

W e b s i t e

PowerPoint slides for this chapter:

www.pearsonglobaleditions.com/Sommerville

Links to supporting videos:

http://software-engineering-book.com/videos/requirements-and-design/

e x e r c i s e s

6.1. When describing a system, explain why you may have to start the

design of the system architecture before the requirements specification is

complete.

6.2. You have been asked to prepare and deliver a presentation to a

nontechnical manager to justify the hiring of a system architect for a new

project. Write a list of bullet points setting out the key points in your

presentation in which you explain the importance of software architecture.

6.3. Performance and security may pose to be conflicting non-functional

requirements when architecting software systems. Make an argument in

support of this statement.

6.4. Draw diagrams showing a conceptual view and a process view of

the architectures of the following systems:

A ticket machine used by passengers at a railway station.

A computer-controlled video conferencing system that allows video, audio,

and computer data to be visible to several participants at the same time.

A robot floor-cleaner that is intended to clean relatively clear spaces such

as corridors. The cleaner must be able to sense walls and other

obstructions.

6.5. A software system will be built to allow drones to autonomously

herd cattle in farms. These drones can be remotely controlled by human

operators. Explain how multiple architectural patterns can fit together to

help build this kind of system.

6.6. Suggest an architecture for a system (such as iTunes) that is used to

sell and distribute music on the Internet. What Architectural patterns are

the basis for your proposed architecture?

6.7. An information system is to be developed to maintain information

about assets owned by a utility company such as buildings, vehicles, and

equipment. It is intended that this will be

194 Chapter 6 Architectural design

updatable by staff working in the field using mobile devices as new asset

information becomes available. The company has several existing asset

databases that should be integrated through this system. Design a layered

architecture for this asset management system based on the generic

information system architecture shown in Figure 6.18.

6.8. Using the generic model of a language processing system presented

here, design the architecture of a system that accepts natural language

commands and translates these into database queries in a language such as

SQL.

6.9. Using the basic model of an information system, as presented in

Figure 6.18, suggest the components that might be part of an information

system that allows users to view box office events, available tickets and

prices, and to eventually buy tickets.

6.10. Should there be a separate profession of ’software architect’ whose

role is to work independently with a customer to design the software

system architecture? A separate software company would then implement

the system. What might be the difficulties of establishing such a

profession?

r e F e r e n C e s

Bass, L., P. Clements, and R. Kazman. 2012. Software Architecture in

Practice (3rd ed.). Boston: Addison-Wesley.

Berczuk, S. P., and B. Appleton. 2002. Software Configuration Management

Patterns: Effective Teamwork, Practical Integration. Boston: Addison-Wesley.

Booch, G. 2014. “Handbook of Software Architecture.” http://

handbookofsoftwarearchitecture.

com/

Bosch, J. 2000. Design and Use of Software Architectures. Harlow, UK:

Addison-Wesley.

Buschmann, F., K. Henney, and D. C. Schmidt. 2007a. Pattern-Oriented

Software Architecture Volume 4: A Pattern Language for Distributed

Computing. New York: John Wiley & Sons.

––––––. 2007b. Pattern-Oriented Software Architecture Volume 5: On Patterns

and Pattern Languages. New York: John Wiley & Sons.

Buschmann, F., R. Meunier, H. Rohnert, and P. Sommerlad. 1996. Pattern-

Oriented Software Architecture Volume 1: A System of Patterns. New York:

John Wiley & Sons.

Chen, L., M. Ali Babar, and B. Nuseibeh. 2013. “Characterizing

Architecturally Significant Requirements.” IEEE Software 30 (2): 38–45.

doi:10.1109/MS.2012.174.

Coplien, J. O., and N. B. Harrison. 2004. Organizational Patterns of Agile

Software Development.

Englewood Cliffs, NJ: Prentice-Hall.

Gamma, E., R. Helm, R. Johnson, and J. Vlissides. 1995. Design Patterns:

Elements of Reusable Object-Oriented Software. Reading, MA: Addison-

Wesley.

Chapter 6 References 195

Garlan, D., and M. Shaw. 1993. “An Introduction to Software

Architecture.” In Advances in Software Engineering and Knowledge

Engineering, edited by V. Ambriola and G. Tortora, 2:1–39. London: World

Scientific Publishing Co.

Hofmeister, C., R. Nord, and D. Soni. 2000. Applied Software Architecture.

Boston: Addison-Wesley.

Kircher, M., and P. Jain. 2004. Pattern-Oriented Software Architecture

Volume 3: Patterns for Resource Management. New York: John Wiley & Sons.

Krutchen, P. 1995. “The 4+1 View Model of Software Architecture.” IEEE

Software 12 (6): 42–50.

doi:10.1109/52.469759.

Lange, C. F. J., M. R. V. Chaudron, and J. Muskens. 2006. “UML Software

Architecture and Design Description.” IEEE Software 23 (2): 40–46.

doi:10.1109/MS.2006.50.

Lewis, P. M., A. J. Bernstein, and M. Kifer. 2003. Databases and Transaction

Processing: An Application-Oriented Approach. Boston: Addison-Wesley.

Martin, D., and I. Sommerville. 2004. “Patterns of Cooperative Interaction:

Linking Ethnomethodol-ogy and Design.” ACM Transactions on Computer-

Human Interaction 11 (1) (March 1): 59–89.

doi:10.1145/972648.972651.

Nii, H. P. 1986. “Blackboard Systems, Parts 1 and 2.” AI Magazine 7 (2 and

3): 38–53 and 62–69.

http://www.aaai.org/ojs/index.php/aimagazine/article/view/537/473

Schmidt, D., M. Stal, H. Rohnert, and F. Buschmann. 2000. Pattern-

Oriented Software Architecture Volume 2: Patterns for Concurrent and

Networked Objects. New York: John Wiley & Sons.

Shaw, M., and D. Garlan. 1996. Software Architecture: Perspectives on an

Emerging Discipline.

Englewood Cliffs, NJ: Prentice-Hall.

Usability Group. 1998. “Usability Patterns”. University of Brighton. http://

www.it.bton.ac.uk/

Research/patterns/home.html

7

Design and

implementation

Objectives

The objectives of this chapter are to introduce object-oriented software

design using the UML and highlight important implementation concerns.

When you have read this chapter, you will:

understand the most important activities in a general, object-oriented

design process;

understand some of the different models that may be used to

document an object-oriented design;

know about the idea of design patterns and how these are a way of

reusing design knowledge and experience;

have been introduced to key issues that have to be considered when

implementing software, including software reuse and open-source

development.

Contents

7.1 Object-oriented design using the UML

7.2 Design patterns

7.3 Implementation issues

7.4 Open-source development

Chapter 7 Design and implementation 197

Software design and implementation is the stage in the software

engineering process

at which an executable software system is developed. For some simple

systems,

software engineering means software design and implementation and all

other soft-

ware engineering activities are merged with this process. However, for

large sys-

tems, software design and implementation is only one of a number of

software

engineering processes (requirements engineering, verification and

validation, etc.).

Software design and implementation activities are invariably interleaved.

Software

design is a creative activity in which you identify software components

and their

relationships, based on a customer’s requirements. Implementation is the

process of

realizing the design as a program. Sometimes there is a separate design

stage, and this design is modeled and documented. At other times, a

design is in the programmer’s

head or roughly sketched on a whiteboard or sheets of paper. Design is

about how

to solve a problem, so there is always a design process. However, it isn’t

always necessary or appropriate to describe the design in detail using the

UML or other design

description language.

Design and implementation are closely linked, and you should normally

take

implementation issues into account when developing a design. For

example, using

the UML to document a design may be the right thing to do if you are

programming

in an object-oriented language such as Java or C#. It is less useful, I think,

if you are developing using a dynamically typed language like Python.

There is no point in

using the UML if you are implementing your system by configuring an off-

the-shelf

package. As I discussed in Chapter 3, agile methods usually work from

informal

sketches of the design and leave design decisions to programmers.

One of the most important implementation decisions that has to be made

at an

early stage of a software project is whether to build or to buy the

application software. For many types of application, it is now possible to

buy off-the-shelf application systems that can be adapted and tailored to

the users’ requirements. For example, if you want to implement a medical

records system, you can buy a package that is

already used in hospitals. It is usually cheaper and faster to use this

approach rather than developing a new system in a conventional

programming language.

When you develop an application system by reusing an off-the-shelf

product, the

design process focuses on how to configure the system product to meet the

applica-

tion requirements. You don’t develop design models of the system, such as

models

of the system objects and their interactions. I discuss this reuse-based

approach to development in Chapter 15.

I assume that most readers of this book have had experience of program

design

and implementation. This is something that you acquire as you learn to

program

and master the elements of a programming language like Java or Python.

You will

have probably learned about good programming practice in the

programming lan-

guages that you have studied, as well as how to debug programs that you

have

developed. Therefore, I don’t cover programming topics here. Instead, this

chapter

has two aims:

1. To show how system modeling and architectural design (covered in

Chapters 5

and 6) are put into practice in developing an object-oriented software

design.

198 Chapter 7 Design and implementation

2. To introduce important implementation issues that are not usually

covered in

programming books. These include software reuse, configuration

management

and open-source development.

As there are a vast number of different development platforms, the

chapter is not

biased toward any particular programming language or implementation

technology.

Therefore, I have presented all examples using the UML rather than a

programming

language such as Java or Python.

7.1 Object-oriented design using the UML

An object-oriented system is made up of interacting objects that maintain

their own local state and provide operations on that state. The

representation of the state is private and cannot be accessed directly from

outside the object. Object-oriented design processes involve designing

object classes and the relationships between these classes. These

classes define the objects in the system and their interactions. When the

design is realized as an executing program, the objects are created

dynamically from these class definitions.

Objects include both data and operations to manipulate that data. They

may there-

fore be understood and modified as stand-alone entities. Changing the

implementa-

tion of an object or adding services should not affect other system objects.

Because objects are associated with things, there is often a clear mapping

between real-world entities (such as hardware components) and their

controlling objects in the system.

This improves the understandability, and hence the maintainability, of the

design.

To develop a system design from concept to detailed, object-oriented

design, you

need to:

1. Understand and define the context and the external interactions with

the system.

2. Design the system architecture.

3. Identify the principal objects in the system.

4. Develop design models.

5. Specify interfaces.

Like all creative activities, design is not a clear-cut, sequential process.

You

develop a design by getting ideas, proposing solutions, and refining these

solutions as information becomes available. You inevitably have to

backtrack and retry when

problems arise. Sometimes you explore options in detail to see if they

work; at other times you ignore details until late in the process.

Sometimes you use notations, such as the UML, precisely to clarify aspects

of the design; at other times, notations are used informally to stimulate

discussions.

I explain object-oriented software design by developing a design for part

of the

embedded software for the wilderness weather station that I introduced in

Chapter 1.

Wilderness weather stations are deployed in remote areas. Each weather

station

7.1 Object-oriented design using the UML 199

Control

1

system

1

1

1..n

Weather

information

1

1..n

Weather

system

station

1

1..n

Figure 7.1 System

context for the

Satellite

1

1

weather station

records local weather information and periodically transfers this to a

weather information system, using a satellite link.

7.1.1 System context and interactions

The first stage in any software design process is to develop an

understanding of the relationships between the software that is being

designed and its external environment.

This is essential for deciding how to provide the required system

functionality and how to structure the system to communicate with its

environment. As I discussed in Chapter 5, understanding the context also

lets you establish the boundaries of the system.

Setting the system boundaries helps you decide what features are

implemented in

the system being designed and what features are in other associated

systems. In this case, you need to decide how functionality is distributed

between the control system for all of the weather stations and the

embedded software in the weather station itself.

System context models and interaction models present complementary

views of

the relationships between a system and its environment:

1. A system context model is a structural model that demonstrates the

other sys-

tems in the environment of the system being developed.

2. An interaction model is a dynamic model that shows how the system

interacts

with its environment as it is used.

The context model of a system may be represented using associations.

Associations simply show that there are some relationships between the

entities

involved in the association. You can document the environment of the

system using

a simple block diagram, showing the entities in the system and their

associations.

Figure 7.1 shows that the systems in the environment of each weather

station are a

weather information system, an onboard satellite system, and a control

system. The

cardinality information on the link shows that there is a single control

system but

several weather stations, one satellite, and one general weather

information system.

When you model the interactions of a system with its environment, you

should

use an abstract approach that does not include too much detail. One way

to do this is to use a use case model. As I discussed in Chapters 4 and 5,

each use case represents

200 Chapter 7 Design and implementation

Weather station use cases

Report weather–send weather data to the weather information system

Report status–send status information to the weather information system

Restart–if the weather station is shut down, restart the system

Shutdown–shut down the weather station

Reconfigure–reconfigure the weather station software

Powersave–put the weather station into power-saving mode

Remote control–send control commands to any weather station subsystem

http://software-engineering-book.com/web/ws-use-cases/

an interaction with the system. Each possible interaction is named in an

ellipse, and the external entity involved in the interaction is represented

by a stick figure.

The use case model for the weather station is shown in Figure 7.2. This

shows

that the weather station interacts with the weather information system to

report

weather data and the status of the weather station hardware. Other

interactions are

with a control system that can issue specific weather station control

commands. The

stick figure is used in the UML to represent other systems as well as

human users.

Each of these use cases should be described in structured natural

language. This

helps designers identify objects in the system and gives them an

understanding of

what the system is intended to do. I use a standard format for this

description that clearly identifies what information is exchanged, how the

interaction is initiated, and so on. As I explain in Chapter 21, embedded

systems are often modeled by describing Report

weather

Report status

Weather

information

system

Restart

Shutdown

Reconfigure

Control

system

Powersave

Remote

Figure 7.2 Weather

control

station use cases

7.1 Object-oriented design using the UML 201

System

Weather station

Use case

Report weather

Actors

Weather information system, Weather station

Data

The weather station sends a summary of the weather data that has been

collected from the instruments in the collection period to the weather

information system. The data sent are the maximum, minimum, and

average ground and air temperatures; the maximum, minimum,

and average air pressures; the maximum, minimum and average wind

speeds; the total

rainfall; and the wind direction as sampled at 5-minute intervals.

Stimulus

The weather information system establishes a satellite communication link

with the weather station and requests transmission of the data.

Response

The summarized data is sent to the weather information system.

Comments

Weather stations are usually asked to report once per hour, but this

frequency may differ from one station to another and may be modified in

future.

Figure 7.3 Use case

description—Report

weather

how they respond to internal or external stimuli. Therefore, the stimuli

and associ-

ated responses should be listed in the description. Figure 7.3 shows the

description of the Report weather use case from Figure 7.2 that is based

on this approach.

7.1.2 Architectural design

Once the interactions between the software system and the system’s

environment

have been defined, you use this information as a basis for designing the

system architecture. Of course, you need to combine this knowledge with

your general knowl-

edge of the principles of architectural design and with more detailed

domain

knowledge. You identify the major components that make up the system

and their

interactions. You may then design the system organization using an

architectural

pattern such as a layered or client–server model.

The high-level architectural design for the weather station software is

shown in

Figure 7.4. The weather station is composed of independent subsystems

that communicate

«subsystem»

«subsystem»

«subsystem»

Fault manager

Configuration manager

Power manager

Communication link

Figure 7.4 High-level

«subsystem»

«subsystem»

«subsystem»

architecture of

Communications

Data collection

Instruments

weather station

202 Chapter 7 Design and implementation

Data collection

Transmitter

Receiver

WeatherData

Figure 7.5 Architecture

of data collection

system

by broadcasting messages on a common infrastructure, shown as

Communication link in

Figure 7.4. Each subsystem listens for messages on that infrastructure and

picks up the messages that are intended for them. This “listener model” is

a commonly used architectural style for distributed systems.

When the communications subsystem receives a control command, such as

shut-

down, the command is picked up by each of the other subsystems, which

then shut

themselves down in the correct way. The key benefit of this architecture is

that it is easy to support different configurations of subsystems because

the sender of a message does not need to address the message to a

particular subsystem.

Figure 7.5 shows the architecture of the data collection subsystem, which

is included in Figure 7.4. The Transmitter and Receiver objects are

concerned with managing

communications, and the WeatherData object encapsulates the

information that is col-

lected from the instruments and transmitted to the weather information

system. This

arrangement follows the producer–consumer pattern, discussed in Chapter

21.

7.1.3 Object class identification

By this stage in the design process, you should have some ideas about the

essential

objects in the system that you are designing. As your understanding of the

design

develops, you refine these ideas about the system objects. The use case

description

helps to identify objects and operations in the system. From the

description of the

Report weather use case, it is obvious that you will need to implement

objects representing the instruments that collect weather data and an

object representing the

summary of the weather data. You also usually need a high-level system

object or

objects that encapsulate the system interactions defined in the use cases.

With these objects in mind, you can start to identify the general object

classes in the system.

As object-oriented design evolved in the 1980s, various ways of

identifying

object classes in object-oriented systems were suggested:

1. Use a grammatical analysis of a natural language description of the

system to be constructed. Objects and attributes are nouns; operations or

services are verbs

(Abbott 1983).

2. Use tangible entities (things) in the application domain such as aircraft,

roles such as manager, events such as request, interactions such as

meetings, locations

7.1 Object-oriented design using the UML 203

WeatherStation

WeatherData

identifier

airTemperatures

groundTemperatures

reportWeather ( )

windSpeeds

reportStatus ( )

windDirections

powerSave (instruments)

pressures

remoteControl (commands)

rainfall

reconfigure (commands)

restart (instruments)

collect ( )

shutdown (instruments)

summarize ( )

Ground

Anemometer

Barometer

thermometer

an_Ident

bar_Ident

gt_Ident

windSpeed

pressure

temperature

windDirection

height

get ( )

get ( )

get ( )

Figure 7.6 Weather

test ( )

test ( )

test ( )

station objects

such as offices, organizational units such as companies, and so on (Wirfs-

Brock,

Wilkerson, and Weiner 1990).

3. Use a scenario-based analysis where various scenarios of system use are

identi-

fied and analyzed in turn. As each scenario is analyzed, the team

responsible for

the analysis must identify the required objects, attributes, and operations

(Beck

and Cunningham 1989).

In practice, you have to use several knowledge sources to discover object

classes.

Object classes, attributes, and operations that are initially identified from

the informal system description can be a starting point for the design.

Information from application domain knowledge or scenario analysis may

then be used to refine and extend the initial objects. This information can

be collected from requirements documents, discus-

sions with users, or analyses of existing systems. As well as the objects

representing entities external to the system, you may also have to design

“implementation objects”

that are used to provide general services such as searching and validity

checking.

In the wilderness weather station, object identification is based on the

tangible

hardware in the system. I don’t have space to include all the system

objects here, but I have shown five object classes in Figure 7.6. The

Ground thermometer,

Anemometer, and Barometer objects are application domain objects, and

the

WeatherStation and WeatherData objects have been identified from the

system

description and the scenario (use case) description:

1. The

WeatherStation object class provides the basic interface of the weather sta-

tion with its environment. Its operations are based on the interactions

shown in

Figure 7.3. I use a single object class, and it includes all of these

interactions.

Alternatively, you could design the system interface as several different

classes,

with one class per interaction.

204 Chapter 7 Design and implementation

2. The

WeatherData object class is responsible for processing the report weather

command. It sends the summarized data from the weather station

instruments to

the weather information system.

3. The

Ground thermometer, Anemometer, and Barometer object classes are

directly

related to instruments in the system. They reflect tangible hardware

entities in the system and the operations are concerned with controlling

that hardware. These

objects operate autonomously to collect data at the specified frequency

and store the collected data locally. This data is delivered to the

WeatherData object on request.

You use knowledge of the application domain to identify other objects,

attributes.

and services:

1. Weather stations are often located in remote places and include various

instru-

ments that sometimes go wrong. Instrument failures should be reported

auto-

matically. This implies that you need attributes and operations to check

the

correct functioning of the instruments.

2. There are many remote weather stations, so each weather station

should have its

own identifier so that it can be uniquely identified in communications.

3. As weather stations are installed at different times, the types of

instrument may be different. Therefore, each instrument should also be

uniquely identified, and

a database of instrument information should be maintained.

At this stage in the design process, you should focus on the objects

themselves, without thinking about how these objects might be

implemented. Once you have identified

the objects, you then refine the object design. You look for common

features and then design the inheritance hierarchy for the system. For

example, you may identify an

Instrument superclass, which defines the common features of all

instruments, such as an identifier, and get and test operations. You may

also add new attributes and operations to the superclass, such as an

attribute that records how often data should be collected.

7.1.4 Design models

Design or system models, as I discussed in Chapter 5, show the objects or

object classes in a system. They also show the associations and

relationships between these entities.

These models are the bridge between the system requirements and the

implementation

of a system. They have to be abstract so that unnecessary detail doesn’t

hide the relationships between them and the system requirements.

However, they also have to

include enough detail for programmers to make implementation decisions.

The level of detail that you need in a design model depends on the design

process

used. Where there are close links between requirements engineers,

designers and

programmers, then abstract models may be all that are required. Specific

design

decisions may be made as the system is implemented, with problems

resolved

through informal discussions. Similarly, if agile development is used,

outline design models on a whiteboard may be all that is required.

7.1 Object-oriented design using the UML 205

However, if a plan-based development process is used, you may need more

detailed models. When the links between requirements engineers,

designers, and pro-

grammers are indirect (e.g., where a system is being designed in one part

of an organization but implemented elsewhere), then precise design

descriptions are needed for

communication. Detailed models, derived from the high-level abstract

models, are

used so that all team members have a common understanding of the

design.

An important step in the design process, therefore, is to decide on the

design models that you need and the level of detail required in these

models. This depends on the type of system that is being developed. A

sequential data-processing system is quite different from an embedded

real-time system, so you need to use different types of design models.

The UML supports 13 different types of models, but, as I discussed in

Chapter 5, many of these models are not widely used. Minimizing the

number of models that are produced reduces the costs of the design and

the time required to complete the design process.

When you use the UML to develop a design, you should develop two kinds

of

design model:

1. Structural models, which describe the static structure of the system using

object classes and their relationships. Important relationships that may be

documented

at this stage are generalization (inheritance) relationships, uses/used-by

relationships, and composition relationships.

2. Dynamic models, which describe the dynamic structure of the system

and show the expected runtime interactions between the system objects.

Interactions that

may be documented include the sequence of service requests made by

objects

and the state changes triggered by these object interactions.

I think three UML model types are particularly useful for adding detail to

use

case and architectural models:

1. Subsystem models, which show logical groupings of objects into coherent

subsystems. These are represented using a form of class diagram with each

subsystem

shown as a package with enclosed objects. Subsystem models are

structural models.

2. Sequence models, which show the sequence of object interactions. These

are represented using a UML sequence or a collaboration diagram.

Sequence models

are dynamic models.

3. State machine models, which show how objects change their state in

response to events. These are represented in the UML using state

diagrams. State machine

models are dynamic models.

A subsystem model is a useful static model that shows how a design is

organized into logically related groups of objects. I have already shown

this type of model in Figure 7.4

to present the subsystems in the weather mapping system. As well as

subsystem models, you may also design detailed object models, showing

the objects in the systems and their associations (inheritance,

generalization, aggregation, etc.). However, there is a danger

206 Chapter 7 Design and implementation

Weather

information system

:SatComms

:WeatherStation

:Commslink

:WeatherData

request (report)

acknowledge

reportWeather ()

acknowledge

get (summary)

summarize ()

send (report)

acknowledge

reply (report)

acknowledge

Figure 7.7 Sequence

diagram describing

in doing too much modeling. You should not make detailed decisions

about the imple-

data collection

mentation that are really best left until the system is implemented.

Sequence models are dynamic models that describe, for each mode of

interaction,

the sequence of object interactions that take place. When documenting a

design, you

should produce a sequence model for each significant interaction. If you

have devel-

oped a use case model, then there should be a sequence model for each

use case that

you have identified.

Figure 7.7 is an example of a sequence model, shown as a UML sequence

diagram. This diagram shows the sequence of interactions that take place

when an

external system requests the summarized data from the weather station.

You read

sequence diagrams from top to bottom:

1. The

SatComms object receives a request from the weather information system

to

collect a weather report from a weather station. It acknowledges receipt of

this

request. The stick arrowhead on the sent message indicates that the

external system

does not wait for a reply but can carry on with other processing.

2. SatComms sends a message to WeatherStation, via a satellite link, to

create a

summary of the collected weather data. Again, the stick arrowhead

indicates

that SatComms does not suspend itself waiting for a reply.

3. WeatherStation sends a message to a Commslink object to summarize

the

weather data. In this case, the squared-off style of arrowhead indicates

that the

instance of the WeatherStation object class waits for a reply.

4. Commslink calls the summarize method in the object WeatherData and

waits

for a reply.

7.1 Object-oriented design using the UML 207

Controlled

Operation

shutdown()

remoteControl()

reportStatus()

restart()

Shutdown

Running

Testing

transmission done

test complete

configuration done

reconfigure()

Transmitting

powerSave()

clock

collection

done

reportWeather()

weather summary

Configuring

complete

Summarizing

Collecting

Figure 7.8 Weather

5. The weather data summary is computed and returned to

WeatherStation via the

station state diagram

Commslink object.

6. WeatherStation then calls the SatComms object to transmit the

summarized data

to the weather information system, through the satellite communications

system.

The SatComms and WeatherStation objects may be implemented as

concurrent

processes, whose execution can be suspended and resumed. The

SatComms object

instance listens for messages from the external system, decodes these

messages, and

initiates weather station operations.

Sequence diagrams are used to model the combined behavior of a group of

objects,

but you may also want to summarize the behavior of an object or a

subsystem in response to messages and events. To do this, you can use a

state machine model that shows how the object instance changes state

depending on the messages that it receives. As I discuss in Chapter 5, the

UML includes state diagrams to describe state machine models.

Figure 7.8 is a state diagram for the weather station system that shows

how it

responds to requests for various services.

You can read this diagram as follows:

1. If the system state is Shutdown, then it can respond to a restart(), a

reconfigure() or a powerSave() message. The unlabeled arrow with the

black blob indicates

that the Shutdown state is the initial state. A restart() message causes a

transition to normal operation. Both the powerSave() and reconfigure()

messages cause a

transition to a state in which the system reconfigures itself. The state

diagram

shows that reconfiguration is allowed only if the system has been shut

down.

208 Chapter 7 Design and implementation

2. In the Running state, the system expects further messages. If a

shutdown() mes-

sage is received, the object returns to the shutdown state.

3. If a reportWeather() message is received, the system moves to the

Summarizing

state. When the summary is complete, the system moves to a Transmitting

state where

the information is transmitted to the remote system. It then returns to the

Running state.

4. If a signal from the clock is received, the system moves to the Collecting

state, where it collects data from the instruments. Each instrument is

instructed in turn

to collect its data from the associated sensors.

5. If a remoteControl() message is received, the system moves to a

controlled state in which it responds to a different set of messages from

the remote control room.

These are not shown on this diagram.

State diagrams are useful high-level models of a system or an object’s

operation.

However, you don’t need a state diagram for all of the objects in the

system. Many

system objects in a system are simple, and their operation can be easily

described

without a state model.

7.1.5 Interface specification

An important part of any design process is the specification of the

interfaces between the components in the design. You need to specify

interfaces so that objects and

subsystems can be designed in parallel. Once an interface has been

specified, the

developers of other objects may assume that interface will be

implemented.

Interface design is concerned with specifying the detail of the interface to

an

object or to a group of objects. This means defining the signatures and

semantics of the services that are provided by the object or by a group of

objects. Interfaces can be specified in the UML using the same notation as

a class diagram. However, there is

no attribute section, and the UML stereotype «interface» should be

included in the

name part. The semantics of the interface may be defined using the object

constraint language (OCL). I discuss the use of the OCL in Chapter 16,

where I explain how it

can be used to describe the semantics of components.

You should not include details of the data representation in an interface

design, as attributes are not defined in an interface specification.

However, you should include operations to access and update data. As the

data representation is hidden, it can be easily changed without affecting

the objects that use that data. This leads to a design that is inherently

more maintainable. For example, an array representation of a stack may

be changed to a list representation without affecting other objects that use

the stack. By contrast, you should normally expose the attributes in an

object model, as this is the clearest way of describing the essential

characteristics of the objects.

There is not a simple 1:1 relationship between objects and interfaces. The

same

object may have several interfaces, each of which is a viewpoint on the

methods that it provides. This is supported directly in Java, where

interfaces are declared separately from objects and objects “implement”

interfaces. Equally, a group of objects may all be accessed through a single

interface.

7.2 Design patterns 209

«interface»

«interface»

Remote Control

Reporting

startInstrument(instrument): iStatus

weatherReport (WS-Ident): Wreport

stopInstrument (instrument): iStatus

statusReport (WS-Ident): Sreport

collectData (instrument): iStatus

Figure 7.9 Weather

provideData (instrument ): string

station interfaces

Figure 7.9 shows two interfaces that may be defined for the weather

station. The left-hand interface is a reporting interface that defines the

operation names that are used to generate weather and status reports.

These map directly to operations in the WeatherStation object. The remote

control interface provides four operations, which map onto a single

method in the WeatherStation object. In this case, the individual

operations are encoded in the command string associated with the

remoteControl method, shown in Figure 7.6.

7.2 Design patterns

Design patterns were derived from ideas put forward by Christopher

Alexander

(Alexander 1979), who suggested that there were certain common

patterns of building

design that were inherently pleasing and effective. The pattern is a

description of the problem and the essence of its solution, so that the

solution may be reused in different settings. The pattern is not a detailed

specification. Rather, you can think of it as a description of accumulated

wisdom and experience, a well-tried solution to a common problem.

A quote from the Hillside Group website (hillside.net/patterns/), which is

dedi-

cated to maintaining information about patterns, encapsulates their role in

reuse:

Patterns and Pattern Languages are ways to describe best practices, good

designs, and capture experience in a way that it is possible for others to reuse

this experience .

Patterns have made a huge impact on object-oriented software design. As

well as

being tested solutions to common problems, they have become a

vocabulary for talk-

ing about a design. You can therefore explain your design by describing

the patterns that you have used. This is particularly true for the best

known design patterns that were originally described by the “Gang of

Four” in their patterns book, published in 1995 (Gamma et al. 1995).

Other important pattern descriptions are those published

in a series of books by authors from Siemens, a large European technology

company

(Buschmann et al. 1996; Schmidt et al. 2000; Kircher and Jain 2004;

Buschmann,

Henney, and Schmidt 2007a, 2007b).

Patterns are a way of reusing the knowledge and experience of other

designers.

Design patterns are usually associated with object-oriented design.

Published patterns often rely on object characteristics such as inheritance

and polymorphism to provide generality. However, the general principle

of encapsulating experience in a pattern is

†The HIllside Group: hillside.net/patterns

210 Chapter 7 Design and implementation

Pattern name: Observer

Description: Separates the display of the state of an object from the object

itself and allows alternative displays to be provided. When the object state

changes, all displays are automatically notified and updated to reflect the

change.

Problem description: In many situations, you have to provide multiple

displays of state information, such as a graphical display and a tabular

display. Not all of these may be known when the information is specified.

All alternative presentations should support interaction and, when the

state is changed, all displays must be updated.

This pattern may be used in situations where more than one display

format for state information is required and where it is not necessary for

the object that maintains the state information to know about the specific

display formats used.

Solution description: This involves two abstract objects, Subject and

Observer, and two concrete objects, ConcreteSubject and ConcreteObject,

which inherit the attributes of the related abstract objects. The abstract

objects include general operations that are applicable in all situations. The

state to be displayed is maintained in ConcreteSubject, which inherits

operations from Subject allowing it to add and remove Observers (each

observer corresponds to a display) and to issue a notification when the

state has changed.

The ConcreteObserver maintains a copy of the state of ConcreteSubject

and implements the Update() interface of Observer that allows these

copies to be kept in step. The ConcreteObserver automatically displays the

state and reflects changes whenever the state is updated.

The UML model of the pattern is shown in Figure 7.12.

Consequences: The subject only knows the abstract Observer and does

not know details of the concrete class.

Therefore there is minimal coupling between these objects. Because of this

lack of knowledge, optimizations that enhance display performance are

impractical. Changes to the subject may cause a set of linked updates to

observers to be generated, some of which may not be necessary.

Figure 7.10 The

one that is equally applicable to any kind of software design. For instance,

you could Observer pattern

have configuration patterns for instantiating reusable application systems.

The Gang of Four defined the four essential elements of design patterns in

their

book on patterns:

1. A name that is a meaningful reference to the pattern.

2. A description of the problem area that explains when the pattern may

be applied.

3. A solution description of the parts of the design solution, their

relationships and their responsibilities. This is not a concrete design

description. It is a template for a design solution that can be instantiated

in different ways. This is often expressed graphically and shows the

relationships between the objects and object classes in the solution.

4. A statement of the consequences—the results and trade-offs—of

applying the

pattern. This can help designers understand whether or not a pattern can

be used

in a particular situation.

Gamma and his co-authors break down the problem description into

motivation

(a description of why the pattern is useful) and applicability (a description

of situations in which the pattern may be used). Under the description of

the solution, they describe the pattern structure, participants,

collaborations, and implementation.

To illustrate pattern description, I use the Observer pattern, taken from

the Gang

of Four’s patterns book. This is shown in Figure 7.10. In my description, I

use the

7.2 Design patterns 211

50

D

A

C

25

B

A B C

D

0

Subject

Observer 1

A: 40

Observer 2

B: 25

C: 15

D: 20

Figure 7.11 Multiple

displays

four essential description elements and also include a brief statement of

what the

pattern can do. This pattern can be used in situations where different

presentations of an object’s state are required. It separates the object that

must be displayed from the different forms of presentation. This is

illustrated in Figure 7.11, which shows two different graphical

presentations of the same dataset.

Graphical representations are normally used to illustrate the object classes

in

patterns and their relationships. These supplement the pattern description

and add

detail to the solution description. Figure 7.12 is the representation in UML

of the

Observer pattern.

To use patterns in your design, you need to recognize that any design

problem

you are facing may have an associated pattern that can be applied.

Examples of such

problems, documented in the Gang of Four’s original patterns book,

include:

1. Tell several objects that the state of some other object has changed

(Observer pattern).

2. Tidy up the interfaces to a number of related objects that have often

been devel-Figure 7.12 A UML

oped incrementally (Façade pattern).

model of the

Observer pattern

Subject

Observer

Attach (Observer)

Update ()

Detach (Observer)

for all o in observers

Notify ()

o -> Update ()

ConcreteSubject

ConcreteObserver

observerState =

GetState ()

return subjectState

Update ()

subject -> GetState ()

subjectState

observerState

212 Chapter 7 Design and implementation

3. Provide a standard way of accessing the elements in a collection,

irrespective of how that collection is implemented (Iterator pattern).

4. Allow for the possibility of extending the functionality of an existing

class at runtime (Decorator pattern).

Patterns support high-level, concept reuse. When you try to reuse

executable

components you are inevitably constrained by detailed design decisions

that have

been made by the implementers of these components. These range from

the particu-

lar algorithms that have been used to implement the components to the

objects and

types in the component interfaces. When these design decisions conflict

with your

requirements, reusing the component is either impossible or introduces

inefficien-

cies into your system. Using patterns means that you reuse the ideas but

can adapt

the implementation to suit the system you are developing.

When you start designing a system, it can be difficult to know, in advance,

if you

will need a particular pattern. Therefore, using patterns in a design

process often

involves developing a design, experiencing a problem, and then

recognizing that a

pattern can be used. This is certainly possible if you focus on the 23

general-purpose patterns documented in the original patterns book.

However, if your problem is a

different one, you may find it difficult to find an appropriate pattern

among the hundreds of different patterns that have been proposed.

Patterns are a great idea, but you need experience of software design to

use them

effectively. You have to recognize situations where a pattern can be

applied. Inexperienced programmers, even if they have read the pattern

books, will always find it hard to decide whether they can reuse a pattern

or need to develop a special-purpose solution.

7.3 Implementation issues

Software engineering includes all of the activities involved in software

development from the initial requirements of the system through to

maintenance and management

of the deployed system. A critical stage of this process is, of course, system

implementation, where you create an executable version of the software.

Implementation

may involve developing programs in high- or low-level programming

languages or

tailoring and adapting generic, off-the-shelf systems to meet the specific

requirements of an organization.

I assume that most readers of this book will understand programming

principles

and will have some programming experience. As this chapter is intended

to offer a

language-independent approach, I haven’t focused on issues of good

programming

practice as language-specific examples need to be used. Instead, I

introduce some

aspects of implementation that are particularly important to software

engineering

and that are often not covered in programming texts. These are:

1. Reuse Most modern software is constructed by reusing existing

components or systems. When you are developing software, you should

make as much use as

possible of existing code.

7.3 Implementation issues 213

System

Application systems

(COTS)

Abstraction

Component

Architectural and

Software reuse

Component

design patterns

frameworks

Programming

language libraries

Figure 7.13 Software

reuse

Object

2. Configuration management During the development process, many

different

versions of each software component are created. If you don’t keep track

of

these versions in a configuration management system, you are liable to

include

the wrong versions of these components in your system.

3. Host-target development Production software does not usually execute on

the same computer as the software development environment. Rather, you

develop

it on one computer (the host system) and execute it on a separate

computer (the

target system). The host and target systems are sometimes of the same

type, but

often they are completely different.

7.3.1 Reuse

From the 1960s to the 1990s, most new software was developed from

scratch, by

writing all code in a high-level programming language. The only

significant reuse or software was the reuse of functions and objects in

programming language libraries.

However, costs and schedule pressure meant that this approach became

increasingly

unviable, especially for commercial and Internet-based systems.

Consequently, an

approach to development based on the reuse of existing software is now

the norm for

many types of system development. A reuse-based approach is now widely

used for

web-based systems of all kinds, scientific software, and, increasingly, in

embedded

systems engineering.

Software reuse is possible at a number of different levels, as shown in

Figure 7.13: 1. The abstraction level At this level, you don’t reuse software

directly but rather use knowledge of successful abstractions in the design

of your software. Design

patterns and architectural patterns (covered in Chapter 6) are ways of

representing

abstract knowledge for reuse.

214 Chapter 7 Design and implementation

2. The object level At this level, you directly reuse objects from a library

rather than writing the code yourself. To implement this type of reuse, you

have to find

appropriate libraries and discover if the objects and methods offer the

function-

ality that you need. For example, if you need to process email messages in

a

Java program, you may use objects and methods from a JavaMail library.

3. The component level Components are collections of objects and object

classes that operate together to provide related functions and services. You

often have

to adapt and extend the component by adding some code of your own. An

example of component-level reuse is where you build your user interface

using

a framework. This is a set of general object classes that implement event

han-

dling, display management, etc. You add connections to the data to be dis-

played and write code to define specific display details such as screen

layout

and colors.

4. The system level At this level, you reuse entire application systems. This

function usually involves some kind of configuration of these systems. This

may be done

by adding and modifying code (if you are reusing a software product line)

or by

using the system’s own configuration interface. Most commercial systems

are

now built in this way where generic application systems systems are

adapted and

reused. Sometimes this approach may involve integrating several

application

systems to create a new system.

By reusing existing software, you can develop new systems more quickly,

with

fewer development risks and at lower cost. As the reused software has

been tested in other applications, it should be more reliable than new

software. However, there are costs associated with reuse:

1. The costs of the time spent in looking for software to reuse and

assessing

whether or not it meets your needs. You may have to test the software to

make

sure that it will work in your environment, especially if this is different

from its development environment.

2. Where applicable, the costs of buying the reusable software. For large

off-the-

shelf systems, these costs can be very high.

3. The costs of adapting and configuring the reusable software components

or

systems to reflect the requirements of the system that you are developing.

4. The costs of integrating reusable software elements with each other (if

you are

using software from different sources) and with the new code that you

have

developed. Integrating reusable software from different providers can be

diffi-

cult and expensive because the providers may make conflicting

assumptions

about how their respective software will be reused.

How to reuse existing knowledge and software should be the first thing

you should

think about when starting a software development project. You should

consider the

7.3 Implementation issues 215

Change

proposals

System

building

Change

management

Component

System

System

versions

versions

releases

Version

Release

Figure 7.14 Configuration

management

management

management

possibilities of reuse before designing the software in detail, as you may

wish to adapt your design to reuse existing software assets. As I discussed

in Chapter 2, in a

reuse-oriented development process, you search for reusable elements,

then modify

your requirements and design to make the best use of these.

Because of the importance of reuse in modern software engineering, I

devote

several chapters in Part 3 of this book to this topic (Chapters 15, 16, and

18).

7.3.2 Configuration management

In software development, change happens all the time, so change

management is

absolutely essential. When several people are involved in developing a

software sys-

tem, you have to make sure that team members don’t interfere with each

other’s

work. That is, if two people are working on a component, their changes

have to be

coordinated. Otherwise, one programmer may make changes and

overwrite the oth-

er’s work. You also have to ensure that everyone can access the most up-

to-date ver-

sions of software components; otherwise developers may redo work that

has already

been done. When something goes wrong with a new version of a system,

you have to

be able to go back to a working version of the system or component.

Configuration management is the name given to the general process of

managing

a changing software system. The aim of configuration management is to

support the

system integration process so that all developers can access the project

code and

documents in a controlled way, find out what changes have been made,

and compile

and link components to create a system. As shown in Figure 7.14, there

are four

fundamental configuration management activities:

1. Version management, where support is provided to keep track of the

different versions of software components. Version management systems

include facilities

to coordinate development by several programmers. They stop one

developer

from overwriting code that has been submitted to the system by someone

else.

2. System integration, where support is provided to help developers define

what versions of components are used to create each version of a system.

This

216 Chapter 7 Design and implementation

Host

Target

Development

Execution

platform

platform

Download

IDE

software

Libraries

Compilers

Related systems

Testing tools

Databases

Figure 7.15 Host-target

development

description is then used to build a system automatically by compiling and

link-

ing the required components.

3. Problem tracking, where support is provided to allow users to report

bugs and other problems, and to allow all developers to see who is

working on these problems and when they are fixed.

4. Release management, where new versions of a software system are

released to customers. Release management is concerned with planning

the functionality of

new releases and organizing the software for distribution.

Software configuration management tools support each of the above

activities.

These tools are usually installed in an integrated development

environment, such as

Eclipse. Version management may be supported using a version

management system

such as Subversion (Pilato, Collins-Sussman, and Fitzpatrick 2008) or Git

(Loeliger

and McCullough 2012), which can support multi-site, multi-team

development.

System integration support may be built into the language or rely on a

separate tool-set such as the GNU build system. Bug tracking or issue

tracking systems, such as

Bugzilla, are used to report bugs and other issues and to keep track of

whether or not these have been fixed. A comprehensive set of tools built

around the Git system is

available at Github (http://github.com).

Because of its importance in professional software engineering, I discuss

change

and configuration management in more detail in Chapter 25.

7.3.3 Host-target development

Most professional software development is based on a host-target model

(Figure 7.15).

Software is developed on one computer (the host) but runs on a separate

machine (the target). More generally, we can talk about a development

platform (host) and an

execution platform (target). A platform is more than just hardware. It

includes the

installed operating system plus other supporting software such as a

database manage-

ment system or, for development platforms, an interactive development

environment.

7.3 Implementation issues 217

Sometimes, the development platform and execution platform are the

same, mak-

ing it possible to develop the software and test it on the same machine.

Therefore, if you develop in Java, the target environment is the Java

Virtual Machine. In principle, this is the same on every computer, so

programs should be portable from one

machine to another. However, particularly for embedded systems and

mobile

systems, the development and the execution platforms are different. You

need to

either move your developed software to the execution platform for testing

or run a

simulator on your development machine.

Simulators are often used when developing embedded systems. You

simulate

hardware devices, such as sensors, and the events in the environment in

which the

system will be deployed. Simulators speed up the development process for

embed-

ded systems as each developer can have his or her own execution platform

with no

need to download the software to the target hardware. However,

simulators are

expensive to develop and so are usually available only for the most

popular

hardware architectures.

If the target system has installed middleware or other software that you

need

to use, then you need to be able to test the system using that software. It

may be

impractical to install that software on your development machine, even if

it is

the same as the target platform, because of license restrictions. If this is

the

case, you need to transfer your developed code to the execution platform

to test

the system.

A software development platform should provide a range of tools to

support soft-

ware engineering processes. These may include:

1. An integrated compiler and syntax-directed editing system that allows

you to

create, edit, and compile code.

2. A language debugging system.

3. Graphical editing tools, such as tools to edit UML models.

4. Testing tools, such as JUnit, that can automatically run a set of tests on

a new version of a program.

5. Tools to support refactoring and program visualization.

6. Configuration management tools to manage source code versions and to

integrate

and build systems.

In addition to these standard tools, your development system may include

more

specialized tools such as static analyzers (discussed in Chapter 12).

Normally, development environments for teams also include a shared

server that runs a change and

configuration management system and, perhaps, a system to support

requirements

management.

Software development tools are now usually installed within an integrated

devel-

opment environment (IDE). An IDE is a set of software tools that supports

different

aspects of software development within some common framework and

user inter-

face. Generally, IDEs are created to support development in a specific

programming

218 Chapter 7 Design and implementation

UML deployment diagrams

UML deployment diagrams show how software components are physically

deployed on processors. That is, the deployment diagram shows the

hardware and software in the system and the middleware used to connect

the different components in the system. Essentially, you can think of

deployment diagrams as a way of defining and documenting the target

environment.

http://software-engineering-book.com/web/deployment/

language such as Java. The language IDE may be developed specially or

may be an

instantiation of a general-purpose IDE, with specific language-support

tools.

A general-purpose IDE is a framework for hosting software tools that

provides data

management facilities for the software being developed and integration

mechanisms

that allow tools to work together. The best-known general-purpose IDE is

the Eclipse

environment (http://www.eclipse.org). This environment is based on a

plug-in architec-

ture so that it can be specialized for different languages, such as Java, and

application domains. Therefore, you can install Eclipse and tailor it for

your specific needs by adding plug-ins. For example, you may add a set of

plug-ins to support networked systems development in Java (Vogel 2013)

or embedded systems engineering using C.

As part of the development process, you need to make decisions about

how the

developed software will be deployed on the target platform. This is

straightforward

for embedded systems, where the target is usually a single computer.

However, for

distributed systems, you need to decide on the specific platforms where

the compo-

nents will be deployed. Issues that you have to consider in making this

decision are: 1. The hardware and software requirements of a component If a

component is

designed for a specific hardware architecture, or relies on some other

software

system, it must obviously be deployed on a platform that provides the

required

hardware and software support.

2. The availability requirements of the system High-availability systems may

require components to be deployed on more than one platform. This

means that, in the event

of platform failure, an alternative implementation of the component is

available.

3. Component communications If there is a lot of intercomponent

communication, it is usually best to deploy them on the same platform or

on platforms that are physically close to one another. This reduces

communications latency—the delay between the

time that a message is sent by one component and received by another.

You can document your decisions on hardware and software deployment

using

UML deployment diagrams, which show how software components are

distributed

across hardware platforms.

If you are developing an embedded system, you may have to take into

account

target characteristics, such as its physical size, power capabilities, the need

for

real-time responses to sensor events, the physical characteristics of

actuators and its real-time operating system. I discuss embedded systems

engineering in Chapter 21.

7.4 Open-source development 219

7.4 Open-source development

Open-source development is an approach to software development in

which the

source code of a software system is published and volunteers are invited to

partici-

pate in the development process (Raymond 2001). Its roots are in the Free

Software

Foundation (www.fsf.org), which advocates that source code should not

be proprie-

tary but rather should always be available for users to examine and

modify as they

wish. There was an assumption that the code would be controlled and

developed by

a small core group, rather than users of the code.

Open-source software extended this idea by using the Internet to recruit a

much larger population of volunteer developers. Many of them are also

users of the code. In principle at least, any contributor to an open-source

project may report and fix bugs and propose new features and

functionality. However, in practice, successful open-source systems still

rely on a core group of developers who control changes to the software.

Open-source software is the backbone of the Internet and software

engineering. The

Linux operating system is the most widely used server system, as is the

open-source

Apache web server. Other important and universally used open-source

products are

Java, the Eclipse IDE, and the mySQL database management system. The

Android

operating system is installed on millions of mobile devices. Major players

in the computer industry such as IBM and Oracle, support the open-source

movement and base

their software on open-source products. Thousands of other, lesser-known

open-source systems and components may also be used.

It is usually cheap or even free to acquire open-source software. You can

normally

download open-source software without charge. However, if you want

documenta-

tion and support, then you may have to pay for this, but costs are usually

fairly low.

The other key benefit of using open-source products is that widely used

open-source

systems are very reliable. They have a large population of users who are

willing to

fix problems themselves rather than report these problems to the

developer and wait

for a new release of the system. Bugs are discovered and repaired more

quickly than

is usually possible with proprietary software.

For a company involved in software development, there are two open-

source

issues that have to be considered:

1. Should the product that is being developed make use of open-source

components?

2. Should an open-source approach be used for its own software

development?

The answers to these questions depend on the type of software that is

being devel-

oped and the background and experience of the development team.

If you are developing a software product for sale, then time to market and

reduced

costs are critical. If you are developing software in a domain in which

there are high-quality open-source systems available, you can save time

and money by using these systems.

However, if you are developing software to a specific set of organizational

require-

ments, then using open-source components may not be an option. You

may have to

integrate your software with existing systems that are incompatible with

available

220 Chapter 7 Design and implementation

open-source systems. Even then, however, it could be quicker and cheaper

to modify

the open-source system rather than redevelop the functionality that you

need.

Many software product companies are now using an open-source approach

to devel-

opment, especially for specialized systems. Their business model is not

reliant on selling a software product but rather on selling support for that

product. They believe that involving the open-source community will

allow software to be developed more cheaply and more quickly and will

create a community of users for the software.

Some companies believe that adopting an open-source approach will

reveal con-

fidential business knowledge to their competitors and so are reluctant to

adopt this development model. However, if you are working in a small

company and you open

source your software, this may reassure customers that they will be able to

support

the software if your company goes out of business.

Publishing the source code of a system does not mean that people from the

wider

community will necessarily help with its development. Most successful

open-source

products have been platform products rather than application systems.

There are a

limited number of developers who might be interested in specialized

application sys-

tems. Making a software system open source does not guarantee

community involve-

ment. There are thousands of open-source projects on Sourceforge and

GitHub that

have only a handful of downloads. However, if users of your software

have concerns

about its availability in future, making the software open source means

that they can take their own copy and so be reassured that they will not

lose access to it.

7.4.1 Open-source licensing

Although a fundamental principle of open-source development is that

source code should be freely available, this does not mean that anyone can

do as they wish with that code.

Legally, the developer of the code (either a company or an individual)

owns the code.

They can place restrictions on how it is used by including legally binding

conditions in an open-source software license (St. Laurent 2004). Some

open-source developers believe that if an open-source component is used

to develop a new system, then that system

should also be open source. Others are willing to allow their code to be

used without this restriction. The developed systems may be proprietary

and sold as closed-source systems.

Most open-source licenses (Chapman 2010) are variants of one of three

general models:

1. The GNU General Public License (GPL). This is a so-called reciprocal

license

that simplistically means that if you use open-source software that is

licensed

under the GPL license, then you must make that software open source.

2. The GNU Lesser General Public License (LGPL). This is a variant of the

GPL

license where you can write components that link to open-source code

without

having to publish the source of these components. However, if you change

the

licensed component, then you must publish this as open source.

3. The Berkley Standard Distribution (BSD) License. This is a

nonreciprocal license, which means you are not obliged to re-publish any

changes or modifications made to

7.4 Open-source development 221

open-source code. You can include the code in proprietary systems that

are sold. If

you use open-source components, you must acknowledge the original

creator of

the code. The MIT license is a variant of the BSD license with similar

conditions.

Licensing issues are important because if you use open-source software as

part of

a software product, then you may be obliged by the terms of the license to

make your own product open source. If you are trying to sell your

software, you may wish to

keep it secret. This means that you may wish to avoid using GPL-licensed

open-

source software in its development.

If you are building software that runs on an open-source platform but that

does

not reuse open-source components, then licenses are not a problem.

However, if

you embed open-source software in your software, you need processes and

data-

bases to keep track of what’s been used and their license conditions.

Bayersdorfer

(Bayersdorfer 2007) suggests that companies managing projects that use

open

source should:

1. Establish a system for maintaining information about open-source

components

that are downloaded and used. You have to keep a copy of the license for

each

component that was valid at the time the component was used. Licenses

may

change, so you need to know the conditions that you have agreed to.

2. Be aware of the different types of licenses and understand how a

component is

licensed before it is used. You may decide to use a component in one

system but

not in another because you plan to use these systems in different ways.

3. Be aware of evolution pathways for components. You need to know a

bit about

the open-source project where components are developed to understand

how

they might change in future.

4. Educate people about open source. It’s not enough to have procedures

in place

to ensure compliance with license conditions. You also need to educate

devel-

opers about open source and open-source licensing.

5. Have auditing systems in place. Developers, under tight deadlines,

might be

tempted to break the terms of a license. If possible, you should have

software in

place to detect and stop this.

6. Participate in the open-source community. If you rely on open-source

products,

you should participate in the community and help support their

development.

The open-source approach is one of several business models for software.

In this

model, companies release the source of their software and sell add-on

services and

advice in association with this. They may also sell cloud-based software

services—

an attractive option for users who do not have the expertise to manage

their own

open-source system and also specialized versions of their system for

particular cli-

ents. Open-source is therefore likely to increase in importance as a way of

develop-

ing and distributing software.

222 Chapter 7 Design and implementation

K e y P o i n t s

Software design and implementation are interleaved activities. The level

of detail in the design depends on the type of system being developed and

whether you are using a plan-driven or agile approach.

The process of object-oriented design includes activities to design the

system architecture, identify objects in the system, describe the design

using different object models, and document the component interfaces.

A range of different models may be produced during an object-oriented

design process. These include static models (class models, generalization

models, association models) and dynamic models (sequence models, state

machine models).

Component interfaces must be defined precisely so that other objects

can use them. A UML

interface stereotype may be used to define interfaces.

When developing software, you should always consider the possibility

of reusing existing software, either as components, services, or complete

systems.

Configuration management is the process of managing changes to an

evolving software system.

It is essential when a team of people is cooperating to develop software.

Most software development is host-target development. You use an IDE

on a host machine to develop the software, which is transferred to a target

machine for execution.

Open-source development involves making the source code of a system

publicly available. This means that many people can propose changes and

improvements to the software.

F U r t h e r r e a D I n g

Design Patterns: Elements of Reusable Object-oriented Software. This is the

original software patterns handbook that introduced software patterns to a

wide community. (E. Gamma, R. Helm, R.

Johnson and J. Vlissides, Addison-Wesley, 1995).

Applying UML and Patterns: An Introduction to Object-oriented Analysis and

Design and Iterative Development, 3rd ed. Larman writes clearly on object-

oriented design and also discusses use of the UML; this is a good

introduction to using patterns in the design process. Although it is more

than 10

years old, it remains the best book on this topic that is available. (C.

Larman, Prentice-Hall, 2004).

Producing Open Source Software: How to Run a Successful Free Software

Project. This book is a comprehensive guide to the background to open-

source software, licensing issues, and the practicalities of running an open-

source development project. (K. Fogel, O’Reilly Media Inc., 2008).

Further reading on software reuse is suggested in Chapter 15 and on

configuration management in Chapter 25.

Chapter 7 Design and Implementation

Chapter 7 Exercises 223

W e b s i t e

PowerPoint slides for this chapter:

www.pearsonglobaleditions.com/Sommerville

Links to supporting videos:

http://software-engineering-book.com/videos/implementation-and-

evolution/

More information on the weather information system:

http://software-engineering-book.com/case-studies/wilderness-weather-

station/

e x e r c i s e s

7.1. Using the tabular notation shown in Figure 7.3, specify the weather

station use cases for Report status and Reconfigure. You should make

reasonable assumptions about the functionality that is required here.

7.2. Assume that the Mentcare system is being developed using an object-

oriented approach. Draw a use case diagram showing at least six possible

use cases for this system.

7.3. Using the UML graphical notation for object classes, design the

following object classes, identifying attributes and operations. Use your

own experience to decide on the attributes and operations that should be

associated with these objects.

a messaging system on a mobile (cell) phone or tablet

a printer for a personal computer

a personal music system

a bank account

a library catalogue

7.4. A shape can be classified into 2-D and 3-D. Design an inheritance

hierarchy that will include different kinds of 2-D and 3-D shapes. Make

sure you identify at least five other classes of shapes.

7.5. Develop the design of the weather station to show the interaction

between the data collection subsystem and the instruments that collect

weather data. Use sequence diagrams to show this interaction.

7.6. Identify possible objects in the following systems and develop an

object-oriented design for them. You may make any reasonable

assumptions about the systems when deriving the design.

A group diary and time management system is intended to support the

timetabling of meetings and appointments across a group of co-workers.

When an appointment is to be made

224

224 Chapter 17 Design

Design and

and Implementation

implementation

that involves a number of people, the system finds a common slot in each

of their diaries and arranges the appointment for that time. If no common

slots are available, it interacts with the user to rearrange his or her

personal diary to make room for the appointment.

A filling station (gas station) is to be set up for fully automated

operation. Drivers swipe their credit card through a reader connected to

the pump; the card is verified by communication with a credit company

computer, and a fuel limit is established. The driver may then take the fuel

required. When fuel delivery is complete and the pump hose is returned to

its holster, the driver’s credit card account is debited with the cost of the

fuel taken. The credit card is returned after debiting. If the card is invalid,

the pump returns it before fuel is dispensed.

7.7. Draw a sequence diagram showing the interactions of objects in a

group diary system when a group of people are arranging a meeting.

7.8. Draw a UML state diagram showing the possible state changes in

either the group diary or the filling station system.

7.9. When code is integrated into a larger system, problems may surface.

Explain how configuration management can be useful when handling such

problems.

7.10. A small company has developed a specialized software product that

it configures specially for each customer. New customers usually have

specific requirements to be incorporated into their system, and they pay

for these to be developed and integrated with the product.

The software company has an opportunity to bid for a new contract,

which would more than double its customer base. The new customer

wishes to have some involvement in the configuration of the system.

Explain why, in these circumstances, it might be a good idea for the

company owning the software to make it open source.

r e F e r e n C e s

Abbott, R. 1983. “Program Design by Informal English Descriptions.”

Comm. ACM 26 (11): 882–894.

doi:10.1145/182.358441.

Alexander, C. 1979. A Timeless Way of Building. Oxford, UK: Oxford

University Press.

Bayersdorfer, M. 2007. “Managing a Project with Open Source

Components.” ACM Interactions 14

(6): 33–34. doi:10.1145/1300655.1300677.

Beck, K., and W. Cunningham. 1989. “A Laboratory for Teaching Object-

Oriented Thinking.” In Proc.

OOPSLA’89 (Conference on Object-Oriented Programming, Systems, Languages

and Applications), 1–6. ACM Press. doi:10.1145/74878.74879.

Buschmann, F., K. Henney, and D. C. Schmidt. 2007a. Pattern-Oriented

Software Architecture Volume 4: A Pattern Language for Distributed

Computing. New York: John Wiley & Sons.

. 2007b. Pattern-Oriented Software Architecture Volume 5: On Patterns and

Pattern Languages. New York: John Wiley & Sons.

Chapter 7 Design and

Chapter Implementation

7 References 225

Buschmann, F., R. Meunier, H. Rohnert, and P. Sommerlad. 1996. Pattern-

Oriented Software Architecture Volume 1: A System of Patterns. New York:

John Wiley & Sons.

Chapman, C. 2010. “A Short Guide to Open-Source and Similar Licences.”

Smashing Magazine.

http://www.smashingmagazine.com/2010/03/24/a-short-guide-to-open-

source-and-similar-licenses./

Gamma, E., R. Helm, R. Johnson, and J. Vlissides. 1995. Design Patterns:

Elements of Reusable Object-Oriented Software. Reading, MA.: Addison-

Wesley.

Kircher, M., and P. Jain. 2004. Pattern-Oriented Software Architecture

Volume 3: Patterns for Resource Management. New York: John Wiley & Sons.

Loeliger, J., and M. McCullough. 2012. Version Control with Git: Powerful

Tools and Techniques for Collaborative Software Development. Sebastopol,

CA: O’Reilly & Associates.

Pilato, C., B. Collins-Sussman, and B. Fitzpatrick. 2008. Version Control

with Subversion. Sebastopol, CA: O’Reilly & Associates.

Raymond, E. S. 2001. The Cathedral and the Bazaar: Musings on Linux and

Open Source by an Accidental Revolutionary. Sebastopol. CA: O’Reilly &

Associates.

Schmidt, D., M. Stal, H. Rohnert, and F. Buschmann. 2000. Pattern-

Oriented Software Architecture Volume 2: Patterns for Concurrent and

Networked Objects. New York: John Wiley & Sons.

St. Laurent, A. 2004. Understanding Open Source and Free Software

Licensing. Sebastopol, CA: O’Reilly & Associates.

Vogel, L. 2013. Eclipse IDE: A Tutoria l. Hamburg, Germany: Vogella

Gmbh.

Wirfs-Brock, R., B. Wilkerson, and L. Weiner. 1990. Designing Object-

Oriented Software. Englewood Cliffs, NJ: Prentice-Hall.

8

Software testing

Objectives

The objective of this chapter is to introduce software testing and

software testing processes. When you have read the chapter, you will:

understand the stages of testing from testing during development

to acceptance testing by system customers;

have been introduced to techniques that help you choose test

cases that are geared to discovering program defects;

understand test-first development, where you design tests before

writing code and run these tests automatically;

know about three distinct types of testing—component testing,

system testing, and release testing;

understand the distinctions between development testing and user

testing.

Contents

8.1 Development testing

8.2 Test-driven development

8.3 Release testing

8.4 User testing

Chapter 8 Software testing 227

Testing is intended to show that a program does what it is intended to do

and to

discover program defects before it is put into use. When you test software,

you exe-

cute a program using artificial data. You check the results of the test run

for errors, anomalies, or information about the program’s non-functional

attributes.

When you test software, you are trying to do two things:

1. Demonstrate to the developer and the customer that the software meets

its

requirements. For custom software, this means that there should be at

least one

test for every requirement in the requirements document. For generic

software

products, it means that there should be tests for all of the system features

that

will be included in the product release. You may also test combinations of

fea-

tures to check for unwanted interactions between them.

2. Find inputs or input sequences where the behavior of the software is

incorrect,

undesirable, or does not conform to its specification. These are caused by

defects

(bugs) in the software. When you test software to find defects, you are

trying to

root out undesirable system behavior such as system crashes, unwanted

interac-

tions with other systems, incorrect computations, and data corruption.

The first of these is validation testing, where you expect the system to

perform

correctly using a set of test cases that reflect the system’s expected use.

The second is defect testing, where the test cases are designed to expose

defects. The test cases in defect testing can be deliberately obscure and

need not reflect how the system is

normally used. Of course, there is no definite boundary between these two

approaches to testing. During validation testing, you will find defects in

the system; during

defect testing, some of the tests will show that the program meets its

requirements.

Figure 8.1 shows the differences between validation testing and defect

testing. Think of the system being tested as a black box. The system

accepts inputs from some input set I and generates outputs in an output

set O. Some of the outputs will be erroneous.

These are the outputs in set Oe that are generated by the system in

response to inputs in the set Ie. The priority in defect testing is to find

those inputs in the set Ie because these reveal problems with the system.

Validation testing involves testing with correct inputs that are outside Ie.

These stimulate the system to generate the expected correct outputs.

Testing cannot demonstrate that the software is free of defects or that it

will behave as specified in every circumstance. It is always possible that a

test you have overlooked could discover further problems with the system.

As Edsger Dijkstra, an early contributor to the development of software

engineering, eloquently stated (Dijkstra 1972):

“Testing can only show the presence of errors, not their absence

Testing is part of a broader process of software verification and validation

(V & V).

Verification and validation are not the same thing, although they are often

confused.

Barry Boehm, a pioneer of software engineering, succinctly expressed the

difference

between them (Boehm 1979):

†Dijkstra, E. W. 1972. “The Humble Programmer.” Comm. ACM 15 (10):

859–66. doi:10.1145/

355604.361591

228 Chapter 8 Software testing

Input test data

Ie

Inputs causing

anomalous

behavior

System

Figure 8.1 An input–

Output test results

Outputs which reveal

Oe

the presence of

output model of

defects

program testing

Validation: Are we building the right product?

Verification: Are we building the product right?

Verification and validation processes are concerned with checking that

software

being developed meets its specification and delivers the functionality

expected by

the people paying for the software. These checking processes start as soon

as requirements become available and continue through all stages of the

development process.

Software verification is the process of checking that the software meets its

stated

functional and non-functional requirements. Validation is a more general

process.

The aim of software validation is to ensure that the software meets the

customer’s

expectations. It goes beyond checking conformance with the specification

to demon-

strating that the software does what the customer expects it to do.

Validation is

essential because, as I discussed in Chapter 4, statements of requirements

do not

always reflect the real wishes or needs of system customers and users.

The goal of verification and validation processes is to establish confidence

that

the software system is “fit for purpose.” This means that the system must

be good

enough for its intended use. The level of required confidence depends on

the sys-

tem’s purpose, the expectations of the system users, and the current

marketing

environment for the system:

1. Software purpose The more critical the software, the more important it is

that it is reliable. For example, the level of confidence required for

software used to

control a safety-critical system is much higher than that required for a

demon-

strator system that prototypes new product ideas.

2. User expectations Because of their previous experiences with buggy,

unreliable software, users sometimes have low expectations of software

quality. They are

not surprised when their software fails. When a new system is installed,

users

Chapter 8 Software testing 229

Inspections

Requirements

Software

UML design

Database

Program

specification

architecture

models

schemas

System

Figure 8.2 Inspections

prototype

Testing

and testing

may tolerate failures because the benefits of use outweigh the costs of

failure

recovery. However, as a software product becomes more established, users

expect it to become more reliable. Consequently, more thorough testing of

later

versions of the system may be required.

3. Marketing environment When a software company brings a system to

market, it must take into account competing products, the price that

customers are willing

to pay for a system, and the required schedule for delivering that system.

In a

competitive environment, the company may decide to release a program

before

it has been fully tested and debugged because it wants to be the first into

the

market. If a software product or app is very cheap, users may be willing to

toler-

ate a lower level of reliability.

As well as software testing, the verification and validation process may

involve

software inspections and reviews. Inspections and reviews analyze and

check the

system requirements, design models, the program source code, and even

proposed

system tests. These are “static” V & V techniques in which you don’t need

to execute the software to verify it. Figure 8.2 shows that software

inspections and testing support V & V at different stages in the software

process. The arrows indicate the stages in the process where the

techniques may be used.

Inspections mostly focus on the source code of a system, but any readable

repre-

sentation of the software, such as its requirements or a design model, can

be

inspected. When you inspect a system, you use knowledge of the system,

its application domain, and the programming or modeling language to

discover errors.

Software inspection has three advantages over testing:

1. During testing, errors can mask (hide) other errors. When an error leads

to

unexpected outputs, you can never be sure if later output anomalies are

due to

a new error or are side effects of the original error. Because inspection

doesn’t

involve executing the system, you don’t have to worry about interactions

between errors. Consequently, a single inspection session can discover

many

errors in a system.

230 Chapter 8 Software testing

Test

Test

Test

Test

cases

data

results

reports

Design test

Prepare test

Run program

Compare results

cases

data

with test data

to test cases

Figure 8.3 A model

of the software

2. Incomplete versions of a system can be inspected without additional

costs. If

testing process

a program is incomplete, then you need to develop specialized test

harnesses

to test the parts that are available. This obviously adds to the system

develop-

ment costs.

3. As well as searching for program defects, an inspection can also

consider

broader quality attributes of a program, such as compliance with

standards,

portability, and maintainability. You can look for inefficiencies,

inappropriate

algorithms, and poor programming style that could make the system

difficult to

maintain and update.

Program inspections are an old idea, and several studies and experiments

have

shown that inspections are more effective for defect discovery than

program testing.

Fagan (Fagan 1976) reported that more than 60% of the errors in a

program can be

detected using informal program inspections. In the Cleanroom process

(Prowell et

al. 1999), it is claimed that more than 90% of defects can be discovered in

program

inspections.

However, inspections cannot replace software testing. Inspections are not

good

for discovering defects that arise because of unexpected interactions

between different parts of a program, timing problems, or problems with

system performance. In

small companies or development groups, it can be difficult and expensive

to put

together a separate inspection team as all potential team members may

also be

developers of the software.

I discuss reviews and inspections in more detail in Chapter 24 (Quality

Management). Static analysis, where the source text of a program is

automatically

analyzed to discover anomalies, is explained in Chapter 12. In this

chapter, I focus on testing and testing processes.

Figure 8.3 is an abstract model of the traditional testing process, as used

in plan-

driven development. Test cases are specifications of the inputs to the test

and the

expected output from the system (the test results), plus a statement of

what is being tested. Test data are the inputs that have been devised to

test a system. Test data can sometimes be generated automatically, but

automatic test case generation is impossible. People who understand what

the system is supposed to do must be involved to

specify the expected test results. However, test execution can be

automated. The test results are automatically compared with the predicted

results, so there is no need for a person to look for errors and anomalies in

the test run.

8.1 Development testing 231

Test planning

Test planning is concerned with scheduling and resourcing all of the

activities in the testing process. It involves defining the testing process,

taking into account the people and the time available. Usually, a test plan

will be created that defines what is to be tested, the predicted testing

schedule, and how tests will be recorded. For critical systems, the test plan

may also include details of the tests to be run on the software.

http://software-engineering-book.com/web/test-planning/

Typically, a commercial software system has to go through three stages of

testing:

1. Development testing, where the system is tested during development to

discover bugs and defects. System designers and programmers are likely to

be involved

in the testing process.

2. Release testing, where a separate testing team tests a complete version of

the system before it is released to users. The aim of release testing is to

check that

the system meets the requirements of the system stakeholders.

3. User testing, where users or potential users of a system test the system in

their own environment. For software products, the “user” may be an

internal marketing group that decides if the software can be marketed,

released and sold.

Acceptance testing is one type of user testing where the customer formally

tests

a system to decide if it should be accepted from the system supplier or if

further

development is required.

In practice, the testing process usually involves a mixture of manual and

auto-

mated testing. In manual testing, a tester runs the program with some test

data and

compares the results to their expectations. They note and report

discrepancies to the program developers. In automated testing, the tests

are encoded in a program that is run each time the system under

development is to be tested. This is faster than manual testing, especially

when it involves regression testing—re-running previous tests to check

that changes to the program have not introduced new bugs.

Unfortunately, testing can never be completely automated as automated

tests can

only check that a program does what it is supposed to do. It is practically

impossible to use automated testing to test systems that depend on how

things look (e.g., a graphical user interface), or to test that a program does

not have unanticipated side effects.

8.1 Development testing

Development testing includes all testing activities that are carried out by

the team developing the system. The tester of the software is usually the

programmer who

developed that software. Some development processes use programmer/

tester pairs

(Cusamano and Selby 1998) where each programmer has an associated

tester who

232 Chapter 8 Software testing

Debugging

Debugging is the process of fixing errors and problems that have been

discovered by testing. Using information from the program tests,

debuggers use their knowledge of the programming language and the

intended outcome of the test to locate and repair the program error. When

you are debugging a program, you usually use interactive tools that

provide extra information about program execution.

http://software-engineering-book.com/web/debugging/

develops tests and assists with the testing process. For critical systems, a

more formal process may be used, with a separate testing group within the

development team.

This group is responsible for developing tests and maintaining detailed

records of

test results.

There are three stages of development testing:

1. Unit testing, where individual program units or object classes are tested.

Unit testing should focus on testing the functionality of objects or

methods.

2. Component testing, where several individual units are integrated to

create composite components. Component testing should focus on testing

the component

interfaces that provide access to the component functions.

3. System testing, where some or all of the components in a system are

integrated and the system is tested as a whole. System testing should focus

on testing component interactions.

Development testing is primarily a defect testing process, where the aim of

test-

ing is to discover bugs in the software. It is therefore usually interleaved

with

debugging—the process of locating problems with the code and changing

the pro-

gram to fix these problems.

8.1.1 Unit testing

Unit testing is the process of testing program components, such as

methods or object classes. Individual functions or methods are the

simplest type of component. Your

tests should be calls to these routines with different input parameters. You

can use the approaches to test-case design discussed in Section 8.1.2 to

design the function or method tests.

When you are testing object classes, you should design your tests to

provide cov-

erage of all of the features of the object. This means that you should test

all operations associated with the object; set and check the value of all

attributes associated with the object; and put the object into all possible

states. This means that you should simulate all events that cause a state

change.

Consider, for example, the weather station object from the example that I

discussed

in Chapter 7. The attributes and operations of this object are shown in

Figure 8.4.

8.1 Development testing 233

WeatherStation

identifier

reportWeather ( )

reportStatus ( )

powerSave (instruments)

remoteControl (commands)

reconfigure (commands)

restart (instruments)

Figure 8.4 The weather

shutdown (instruments)

station object interface

It has a single attribute, which is its identifier. This is a constant that is set

when the weather station is installed. You therefore only need a test that

checks if it has been properly set up. You need to define test cases for all

of the methods associated with the object such as reportWeather and

reportStatus. Ideally, you should test methods in

isolation, but, in some cases, test sequences are necessary. For example, to

test the method that shuts down the weather station instruments

(shutdown), you need to have

executed the restart method.

Generalization or inheritance makes object class testing more complicated.

You

can’t simply test an operation in the class where it is defined and assume

that it will work as expected in all of the subclasses that inherit the

operation. The operation that is inherited may make assumptions about

other operations and attributes. These

assumptions may not be valid in some subclasses that inherit the

operation. You

therefore have to test the inherited operation everywhere that it is used.

To test the states of the weather station, you can use a state model as

discussed in Chapter 7 (Figure 7.8). Using this model, you identify

sequences of state transitions that have to be tested and define event

sequences to force these transitions. In principle, you should test every

possible state transition sequence, although in practice this may be too

expensive. Examples of state sequences that should be tested in the

weather station include:

Shutdown → Running → Shutdown

Configuring → Running → Testing → Transmitting → Running

Running → Collecting → Running → Summarizing → Transmitting →

Running

Whenever possible, you should automate unit testing. In automated unit

testing, you

make use of a test automation framework, such as JUnit (Tahchiev et al.

2010) to write and run your program tests. Unit testing frameworks

provide generic test classes that you extend to create specific test cases.

They can then run all of the tests that you have implemented and report,

often through some graphical unit interface (GUI), on the success or

otherwise of the tests. An entire test suite can often be run in a few

seconds, so it is possible to execute all tests every time you make a change

to the program.

An automated test has three parts:

1. A setup part, where you initialize the system with the test case, namely,

the inputs and expected outputs.

234 Chapter 8 Software testing

2. A call part, where you call the object or method to be tested.

3. An assertion part, where you compare the result of the call with the

expected result. If the assertion evaluates to true, the test has been

successful; if false,

then it has failed.

Sometimes, the object that you are testing has dependencies on other

objects that

may not have been implemented or whose use slows down the testing

process. For

example, if an object calls a database, this may involve a slow setup

process before it can be used. In such cases, you may decide to use mock

objects.

Mock objects are objects with the same interface as the external objects

being

used that simulate its functionality. For example, a mock object simulating

a data-

base may have only a few data items that are organized in an array. They

can be

accessed quickly, without the overheads of calling a database and

accessing disks.

Similarly, mock objects can be used to simulate abnormal operations or

rare

events. For example, if your system is intended to take action at certain

times of

day, your mock object can simply return those times, irrespective of the

actual

clock time.

8.1.2 Choosing unit test cases

Testing is expensive and time consuming, so it is important that you

choose effective unit test cases. Effectiveness, in this case, means two

things:

1. The test cases should show that, when used as expected, the component

that you

are testing does what it is supposed to do.

2. If there are defects in the component, these should be revealed by test

cases.

You should therefore design two kinds of test case. The first of these

should

reflect normal operation of a program and should show that the

component works.

For example, if you are testing a component that creates and initializes a

new patient record, then your test case should show that the record exists

in a database and that its fields have been set as specified. The other kind

of test case should be based on testing experience of where common

problems arise. It should use abnormal inputs

to check that these are properly processed and do not crash the

component.

Two strategies that can be effective in helping you choose test cases are:

1. Partition testing, where you identify groups of inputs that have common

characteristics and should be processed in the same way. You should

choose tests from

within each of these groups.

2. Guideline-based testing, where you use testing guidelines to choose test

cases.

These guidelines reflect previous experience of the kinds of errors that

program-

mers often make when developing components.

8.1 Development testing 235

Input equivalence partitions

Output partitions

System

Figure 8.5 Equivalence

partitioning

Possible inputs

Correct outputs

Possible outputs

The input data and output results of a program can be thought of as

members of

sets with common characteristics. Examples of these sets are positive

numbers, negative numbers, and menu selections. Programs normally

behave in a comparable way for

all members of a set. That is, if you test a program that does a

computation and

requires two positive numbers, then you would expect the program to

behave in the

same way for all positive numbers.

Because of this equivalent behavior, these classes are sometimes called

equiva-

lence partitions or domains (Bezier 1990). One systematic approach to

test-case

design is based on identifying all input and output partitions for a system

or component. Test cases are designed so that the inputs or outputs lie

within these partitions.

Partition testing can be used to design test cases for both systems and

components.

In Figure 8.5, the large shaded ellipse on the left represents the set of all

possible inputs to the program that is being tested. The smaller unshaded

ellipses represent

equivalence partitions. A program being tested should process all of the

members of

an input equivalence partition in the same way.

Output equivalence partitions are partitions within which all of the

outputs have

something in common. Sometimes there is a 1:1 mapping between input

and output

equivalence partitions. However, this is not always the case; you may need

to define a separate input equivalence partition, where the only common

characteristic of the

inputs is that they generate outputs within the same output partition. The

shaded area in the left ellipse represents inputs that are invalid. The

shaded area in the right ellipse represents exceptions that may occur, that

is, responses to invalid inputs.

Once you have identified a set of partitions, you choose test cases from

each of

these partitions. A good rule of thumb for test-case selection is to choose

test cases on the boundaries of the partitions, plus cases close to the

midpoint of the partition.

The reason for this is that designers and programmers tend to consider

typical values of inputs when developing a system. You test these by

choosing the midpoint of the

partition. Boundary values are often atypical (e.g., zero may behave

differently from other non-negative numbers) and so are sometimes

overlooked by developers.

Program failures often occur when processing these atypical values.

236 Chapter 8 Software testing

3

11

4

7

10

Less than 4

Between 4 and 10

More than 10

Number of input values

9999

100000

10000

50000

99999

Less than 10000

Between 10000 and 99999

More than 99999

Figure 8.6 Equivalence

partitions

Input values

You identify partitions by using the program specification or user

documentation and from experience where you predict the classes of input

value that are likely to detect errors. For example, say a program

specification states that the program accepts four to eight inputs which are

five-digit integers greater than 10,000. You use this information to

identify the input partitions and possible test input values. These are

shown in Figure 8.6.

When you use the specification of a system to identify equivalence

partitions, this

is called black-box testing. You don’t need any knowledge of how the

system works.

It is sometimes useful to supplement the black-box tests with “white-box

testing,”

where you look at the code of the program to find other possible tests. For

example, your code may include exceptions to handle incorrect inputs.

You can use this

knowledge to identify “exception partitions”—different ranges where the

same

exception handling should be applied.

Equivalence partitioning is an effective approach to testing because it

helps

account for errors that programmers often make when processing inputs at

the edges

of partitions. You can also use testing guidelines to help choose test cases.

Guidelines encapsulate knowledge of what kinds of test cases are effective

for discovering

errors. For example, when you are testing programs with sequences,

arrays, or lists, guidelines that could help reveal defects include:

1. Test software with sequences that have only a single value.

Programmers natu-

rally think of sequences as made up of several values, and sometimes they

embed this assumption in their programs. Consequently, if presented with

a

single-value sequence, a program may not work properly.

2. Use different sequences of different sizes in different tests. This

decreases the chances that a program with defects will accidentally

produce a correct output

because of some accidental characteristics of the input.

3. Derive tests so that the first, middle, and last elements of the sequence

are

accessed. This approach reveals problems at partition boundaries.

8.1 Development testing 237

Path testing

Path testing is a testing strategy that aims to exercise every independent

execution path through a component or program. If every independent

path is executed, then all statements in the component must have been

executed at least once. All conditional statements are tested for both true

and false cases. In an object-oriented development process, path testing

may be used to test the methods associated with objects.

http://software-engineering-book.com/web/path-testing/

Whittaker’s book (Whittaker 2009) includes many examples of guidelines

that

can be used in test-case design. Some of the most general guidelines that

he suggests are:

Choose inputs that force the system to generate all error messages:

Design inputs that cause input buffers to overflow.

Repeat the same input or series of inputs numerous times.

Force invalid outputs to be generated.

Force computation results to be too large or too small.

As you gain experience with testing, you can develop your own guidelines

about

how to choose effective test cases. I give more examples of testing

guidelines in the next section.

8.1.3 Component testing

Software components are often made up of several interacting objects. For

example,

in the weather station system, the reconfiguration component includes

objects that

deal with each aspect of the reconfiguration. You access the functionality

of these

objects through component interfaces (see Chapter 7). Testing composite

components

should therefore focus on showing that the component interface or

interfaces behave

according to its specification. You can assume that unit tests on the

individual objects within the component have been completed.

Figure 8.7 illustrates the idea of component interface testing. Assume that

compo-

nents A, B, and C have been integrated to create a larger component or

subsystem.

The test cases are not applied to the individual components but rather to

the interface of the composite component created by combining these

components. Interface errors

in the composite component may not be detectable by testing the

individual objects

because these errors result from interactions between the objects in the

component.

There are different types of interface between program components and,

conse-

quently, different types of interface error that can occur:

1. Parameter interfaces These are interfaces in which data or sometimes

function references are passed from one component to another. Methods

in an object

have a parameter interface.

238 Chapter 8 Software testing

Test

cases

A

B

C

Figure 8.7 Interface

testing

2. Shared memory interfaces These are interfaces in which a block of

memory is shared between components. Data is placed in the memory by

one subsystem

and retrieved from there by other subsystems. This type of interface is

used in

embedded systems, where sensors create data that is retrieved and

processed by

other system components.

3. Procedural interfaces These are interfaces in which one component

encapsulates a set of procedures that can be called by other components.

Objects and

reusable components have this form of interface.

4. Message passing interfaces These are interfaces in which one component

requests a service from another component by passing a message to it. A

return

message includes the results of executing the service. Some object-oriented

sys-

tems have this form of interface, as do client–server systems.

Interface errors are one of the most common forms of error in complex

systems

(Lutz 1993). These errors fall into three classes:

Interface misuse A calling component calls some other component and

makes an error in the use of its interface. This type of error is common in

parameter interfaces, where parameters may be of the wrong type or be

passed in the wrong

order, or the wrong number of parameters may be passed.

Interface misunderstanding A calling component misunderstands the

specification of the interface of the called component and makes

assumptions about its behavior.

The called component does not behave as expected, which then causes

unexpected

behavior in the calling component. For example, a binary search method

may be

called with a parameter that is an unordered array. The search would then

fail.

Timing errors These occur in real-time systems that use a shared memory

or a message-passing interface. The producer of data and the consumer of

data may

8.1 Development testing 239

operate at different speeds. Unless particular care is taken in the interface

design, the consumer can access out-of-date information because the

producer of the

information has not updated the shared interface information.

Testing for interface defects is difficult because some interface faults may

only

manifest themselves under unusual conditions. For example, say an object

imple-

ments a queue as a fixed-length data structure. A calling object may

assume that the queue is implemented as an infinite data structure, and so

it does not check for queue overflow when an item is entered.

This condition can only be detected during testing by designing a

sequence of test

cases that force the queue to overflow. The tests should check how calling

objects

handle that overflow. However, as this is a rare condition, testers may

think that this isn’t worth checking when writing the test set for the queue

object.

A further problem may arise because of interactions between faults in

different

modules or objects. Faults in one object may only be detected when some

other

object behaves in an unexpected way. Say an object calls another object to

receive

some service and the calling object assumes that the response is correct. If

the called service is faulty in some way, the returned value may be valid

but incorrect. The

problem is therefore not immediately detectable but only becomes obvious

when

some later computation, using the returned value, goes wrong.

Some general guidelines for interface testing are:

1. Examine the code to be tested and identify each call to an external

component.

Design a set of tests in which the values of the parameters to the external

com-

ponents are at the extreme ends of their ranges. These extreme values are

most

likely to reveal interface inconsistencies.

2. Where pointers are passed across an interface, always test the interface

with null pointer parameters.

3. Where a component is called through a procedural interface, design

tests that

deliberately cause the component to fail. Differing failure assumptions are

one

of the most common specification misunderstandings.

4. Use stress testing in message passing systems. This means that you

should

design tests that generate many more messages than are likely to occur in

prac-

tice. This is an effective way of revealing timing problems.

5. Where several components interact through shared memory, design

tests that

vary the order in which these components are activated. These tests may

reveal

implicit assumptions made by the programmer about the order in which

the

shared data is produced and consumed.

Sometimes it is better to use inspections and reviews rather than testing to

look

for interface errors. Inspections can concentrate on component interfaces

and ques-

tions about the assumed interface behavior asked during the inspection

process.

240 Chapter 8 Software testing

8.1.4 System testing

System testing during development involves integrating components to

create a ver-

sion of the system and then testing the integrated system. System testing

checks that components are compatible, interact correctly, and transfer

the right data at the right time across their interfaces. It obviously

overlaps with component testing, but there are two important differences:

1. During system testing, reusable components that have been separately

developed

and off-the-shelf systems may be integrated with newly developed

components.

The complete system is then tested.

2. Components developed by different team members or subteams may be

integrated

at this stage. System testing is a collective rather than an individual

process. In

some companies, system testing may involve a separate testing team with

no

involvement from designers and programmers.

All systems have emergent behavior. This means that some system

functionality

and characteristics only become obvious when you put the components

together.

This may be planned emergent behavior, which has to be tested. For

example, you

may integrate an authentication component with a component that

updates the sys-

tem database. You then have a system feature that restricts information

updating to

authorized users. Sometimes, however, the emergent behavior is

unplanned and

unwanted. You have to develop tests that check that the system is only

doing what it is supposed to do.

System testing should focus on testing the interactions between the

components

and objects that make up a system. You may also test reusable

components or sys-

tems to check that they work as expected when they are integrated with

new compo-

nents. This interaction testing should discover those component bugs that

are only

revealed when a component is used by other components in the system.

Interaction

testing also helps find misunderstandings, made by component developers,

about

other components in the system.

Because of its focus on interactions, use case-based testing is an effective

approach to system testing. Several components or objects normally

implement each

use case in the system. Testing the use case forces these interactions to

occur. If you have developed a sequence diagram to model the use case

implementation, you can

see the objects or components that are involved in the interaction.

In the wilderness weather station example, the system software reports

summa-

rized weather data to a remote computeras described in Figure 7.3. Figure

8.8 shows

the sequence of operations in the weather station when it responds to a

request to collect data for the mapping system. You can use this diagram

to identify operations that will be tested and to help design the test cases

to execute the tests. Therefore issuing a request for a report will result in

the execution of the following thread of methods: SatComms:request →

WeatherStation:reportWeather → Commslink:Get(summary)

→ WeatherData:summarize

8.1 Development testing 241

Weather

information system

SatComms

WeatherStation

Commslink

WeatherData

request (report)

acknowledge

reportWeather ()

acknowledge

get (summary)

summarise ()

send (report)

acknowledge

reply (report)

acknowledge

Figure 8.8 Collect

weather data

sequence chart

The sequence diagram helps you design the specific test cases that you

need, as it

shows what inputs are required and what outputs are created:

1. An input of a request for a report should have an associated

acknowledgment.

A report should ultimately be returned from the request. During testing,

you

should create summarized data that can be used to check that the report is

cor-

rectly organized.

2. An input request for a report to WeatherStation results in a summarized

report

being generated. You can test this in isolation by creating raw data

correspond-

ing to the summary that you have prepared for the test of SatComms and

check-

ing that the WeatherStation object correctly produces this summary. This

raw

data is also used to test the WeatherData object.

Of course, I have simplified the sequence diagram in Figure 8.8 so that it

does not

show exceptions. A complete use case/scenario test must take these

exceptions into

account and ensure that they are correctly handled.

For most systems, it is difficult to know how much system testing is

essential and

when you should stop testing. Exhaustive testing, where every possible

program

execution sequence is tested, is impossible. Testing, therefore, has to be

based on a subset of possible test cases. Ideally, software companies

should have policies for

choosing this subset. These policies might be based on general testing

policies, such as a policy that all program statements should be executed

at least once. Alternatively, they may be based on experience of system

usage and focus on testing the features of the operational system. For

example:

242 Chapter 8 Software testing

Incremental integration and testing

System testing involves integrating different components, then testing the

integrated system that you have created. You should always use an

incremental approach to integration and testing where you integrate a

component, test the system, integrate another component, test again, and

so on. If problems occur, they are probably due to interactions with the

most recently integrated component.

Incremental integration and testing is fundamental to agile methods,

where regression tests are run every time a new increment is integrated.

http://software-engineering-book.com/web/integration/

1. All system functions that are accessed through menus should be tested.

2. Combinations of functions (e.g., text formatting) that are accessed

through the

same menu must be tested.

3. Where user input is provided, all functions must be tested with both

correct and incorrect input.

It is clear from experience with major software products such as word

processors

or spreadsheets that similar guidelines are normally used during product

testing.

When features of the software are used in isolation, they normally work.

Problems

arise, as Whittaker explains (Whittaker 2009), when combinations of less

com-

monly used features have not been tested together. He gives the example

of how, in

a commonly used word processor, using footnotes with multicolumn

layout causes

incorrect layout of the text.

Automated system testing is usually more difficult than automated unit or

compo-

nent testing. Automated unit testing relies on predicting the outputs and

then encoding these predictions in a program. The prediction is then

compared with the result.

However, the point of implementing a system may be to generate outputs

that are

large or cannot be easily predicted. You may be able to examine an output

and check

its credibility without necessarily being able to create it in advance.

8.2 Test-driven development

Test-driven development (TDD) is an approach to program development in

which

you interleave testing and code development (Beck 2002; Jeffries and

Melnik 2007).

You develop the code incrementally, along with a set of tests for that

increment. You don’t start working on the next increment until the code

that you have developed

passes all of its tests. Test-driven development was introduced as part of

the XP agile development method. However, it has now gained

mainstream acceptance and may

be used in both agile and plan-based processes.

8.2 Test-driven development 243

Identify new

pass

functionality

Implement

fail

Write test

Run test

functionality and

refactor

Figure 8.9 Test-driven

development

The fundamental TDD process is shown in Figure 8.9. The steps in the

process

are as follows:

1. You start by identifying the increment of functionality that is required.

This

should normally be small and implementable in a few lines of code.

2. You write a test for this functionality and implement it as an automated

test.

This means that the test can be executed and will report whether or not it

has

passed or failed.

3. You then run the test, along with all other tests that have been

implemented.

Initially, you have not implemented the functionality so the new test will

fail.

This is deliberate as it shows that the test adds something to the test set.

4. You then implement the functionality and re-run the test. This may

involve

refactoring existing code to improve it and add new code to what’s already

there.

5. Once all tests run successfully, you move on to implementing the next

chunk of

functionality.

An automated testing environment, such as the JUnit environment that

supports

Java program testing (Tahchiev et al. 2010) is essential for TDD. As the

code is

developed in very small increments, you have to be able to run every test

each time

that you add functionality or refactor the program. Therefore, the tests are

embedded in a separate program that runs the tests and invokes the

system that is being tested.

Using this approach, you can run hundreds of separate tests in a few

seconds.

Test-driven development helps programmers clarify their ideas of what a

code

segment is actually supposed to do. To write a test, you need to

understand what is

intended, as this understanding makes it easier to write the required code.

Of course, if you have incomplete knowledge or understanding, then TDD

won’t help.

If you don’t know enough to write the tests, you won’t develop the

required code.

For example, if your computation involves division, you should check that

you are

not dividing the numbers by zero. If you forget to write a test for this,

then the checking code will never be included in the program.

As well as better problem understanding, other benefits of test-driven

development are: 1. Code coverage In principle, every code segment that

you write should have at least one associated test. Therefore, you can be

confident that all of the code in

244 Chapter 8 Software testing

the system has actually been executed. Code is tested as it is written, so

defects

are discovered early in the development process.

2. Regression testing A test suite is developed incrementally as a program is

developed. You can always run regression tests to check that changes to

the program

have not introduced new bugs.

3. Simplified debugging When a test fails, it should be obvious where the

problem lies. The newly written code needs to be checked and modified.

You do

not need to use debugging tools to locate the problem. Reports of the use

of

TDD suggest that it is hardly ever necessary to use an automated debugger

in

test-driven development (Martin 2007).

4. System documentation The tests themselves act as a form of

documentation that describe what the code should be doing. Reading the

tests can make it easier to

understand the code.

One of the most important benefits of TDD is that it reduces the costs of

regres-

sion testing. Regression testing involves running test sets that have

successfully

executed after changes have been made to a system. The regression test

checks that

these changes have not introduced new bugs into the system and that the

new code

interacts as expected with the existing code. Regression testing is

expensive and

sometimes impractical when a system is manually tested, as the costs in

time and

effort are very high. You have to try to choose the most relevant tests to

re-run and it is easy to miss important tests.

Automated testing dramatically reduces the costs of regression testing.

Existing

tests may be re-run quickly and cheaply. After making a change to a

system in test-

first development, all existing tests must run successfully before any

further func-

tionality is added. As a programmer, you can be confident that the new

functionality that you have added has not caused or revealed problems

with existing code.

Test-driven development is of most value in new software development

where

the functionality is either implemented in new code or by using

components from

standard libraries. If you are reusing large code components or legacy

systems, then you need to write tests for these systems as a whole. You

cannot easily decompose

them into separate testable elements. Incremental test-driven development

is imprac-

tical. Test-driven development may also be ineffective with multithreaded

systems.

The different threads may be interleaved at different times in different test

runs, and so may produce different results.

If you use TDD, you still need a system testing process to validate the

system,

that is, to check that it meets the requirements of all of the system

stakeholders.

System testing also tests performance, reliability, and checks that the

system does

not do things that it shouldn’t do, such as produce unwanted outputs.

Andrea (Andrea 2007) suggests how testing tools can be extended to

integrate some aspects of system testing with TDD.

Test-driven development is now a widely used and mainstream approach

to soft-

ware testing. Most programmers who have adopted this approach are

happy with it

8.3 Release testing 245

and find it a more productive way to develop software. It is also claimed

that use of TDD encourages better structuring of a program and improved

code quality.

However, experiments to verify this claim have been inconclusive.

8.3 Release testing

Release testing is the process of testing a particular release of a system

that is intended for use outside of the development team. Normally, the

system release is for customers and users. In a complex project, however,

the release could be for other teams that are developing related systems.

For software products, the release could be for product

management who then prepare it for sale.

There are two important distinctions between release testing and system

testing

during the development process:

1. The system development, team should not be responsible for release

testing.

2. Release testing is a process of validation checking to ensure that a

system meets its requirements and is good enough for use by system

customers. System testing by the development team should focus on

discovering bugs in the system

(defect testing).

The primary goal of the release testing process is to convince the supplier

of the

system that it is good enough for use. If so, it can be released as a product

or delivered to the customer. Release testing, therefore, has to show that

the system delivers its specified functionality, performance, and

dependability, and that it does not fail during normal use.

Release testing is usually a black-box testing process whereby tests are

derived

from the system specification. The system is treated as a black box whose

behavior

can only be determined by studying its inputs and the related outputs.

Another name

for this is functional testing, so-called because the tester is only concerned

with

functionality and not the implementation of the software.

8.3.1 Requirements-based testing

A general principle of good requirements engineering practice is that

require-

ments should be testable. That is, the requirement should be written so

that a test

can be designed for that requirement. A tester can then check that the

require-

ment has been satisfied. Requirements-based testing, therefore, is a

systematic

approach to test-case design where you consider each requirement and

derive a

set of tests for it. Requirements-based testing is validation rather than

defect

testing—you are trying to demonstrate that the system has properly

implemented

its requirements.

246 Chapter 8 Software testing

For example, consider the following Mentcare system requirements that

are con-

cerned with checking for drug allergies:

If a patient is known to be allergic to any particular medication, then

prescription of that medication shall result in a warning message being issued to

the

system user.

If a prescriber chooses to ignore an allergy warning, he or she shall provide

a reason why this has been ignored.

To check if these requirements have been satisfied, you may need to

develop sev-

eral related tests:

1. Set up a patient record with no known allergies. Prescribe medication

for aller-

gies that are known to exist. Check that a warning message is not issued

by the

system.

2. Set up a patient record with a known allergy. Prescribe the medication

that the

patient is allergic to and check that the warning is issued by the system.

3. Set up a patient record in which allergies to two or more drugs are

recorded.

Prescribe both of these drugs separately and check that the correct

warning for

each drug is issued.

4. Prescribe two drugs that the patient is allergic to. Check that two

warnings are correctly issued.

5. Prescribe a drug that issues a warning and overrule that warning. Check

that the system requires the user to provide information explaining why

the warning was

overruled.

You can see from this list that testing a requirement does not mean just

writing a

single test. You normally have to write several tests to ensure that you

have coverage of the requirement. You should also keep traceability

records of your requirements-based testing, which link the tests to the

specific requirements that you have tested.

8.3.2 Scenario testing

Scenario testing is an approach to release testing whereby you devise

typical sce-

narios of use and use these scenarios to develop test cases for the system.

A scenario is a story that describes one way in which the system might be

used. Scenarios

should be realistic, and real system users should be able to relate to them.

If you have used scenarios or user stories as part of the requirements

engineering process

(described in Chapter 4), then you may be able to reuse them as testing

scenarios.

In a short paper on scenario testing, Kaner (Kaner 2003) suggests that a

scenario

test should be a narrative story that is credible and fairly complex. It

should motivate stakeholders; that is, they should relate to the scenario

and believe that it is

8.3 Release testing 247

George is a nurse who specializes in mental health care. One of his

responsibilities is to visit patients at home to check that their treatment is

effective and that they are not suffering from medication side effects.

On a day for home visits, George logs into the Mentcare system and uses it

to print his schedule of home visits for that day, along with summary

information about the patients to be visited. He requests that the records

for these patients be downloaded to his laptop. He is prompted for his key

phrase to encrypt the records on the laptop.

One of the patients whom he visits is Jim, who is being treated with

medication for depression. Jim feels that the medication is helping him

but believes that it has the side effect of keeping him awake at night.

George looks up Jim’s record and is prompted for his key phrase to

decrypt the record. He checks the drug prescribed and queries its side

effects. Sleeplessness is a known side effect, so he notes the problem in

Jim’s record and suggests that he visit the clinic to have his medication

changed. Jim agrees, so George enters a prompt to call him when he gets

back to the clinic to make an appointment with a physician. George ends

the consultation, and the system re-encrypts Jim’s record.

After finishing his consultations, George returns to the clinic and uploads

the records of patients visited to the database. The system generates a call

list for George of those patients whom he has to contact for follow-up

information and make clinic appointments.

Figure 8.10 A user

important that the system passes the test. He also suggests that it should

be easy to story for the

Mentcare system

evaluate. If there are problems with the system, then the release testing

team should recognize them.

As an example of a possible scenario from the Mentcare system, Figure

8.10

describes one way that the system may be used on a home visit. This

scenario tests a number of features of the Mentcare system:

1. Authentication by logging on to the system.

2. Downloading and uploading of specified patient records to a laptop.

3. Home visit scheduling.

4. Encryption and decryption of patient records on a mobile device.

5. Record retrieval and modification.

6. Links with the drugs database that maintains side-effect information.

7. The system for call prompting.

If you are a release tester, you run through this scenario, playing the role

of

George and observing how the system behaves in response to different

inputs. As

George, you may make deliberate mistakes, such as inputting the wrong

key phrase

to decode records. This checks the response of the system to errors. You

should carefully note any problems that arise, including performance

problems. If a system is

too slow, this will change the way that it is used. For example, if it takes

too long to encrypt a record, then users who are short of time may skip

this stage. If they then lose their laptop, an unauthorized person could

then view the patient records.

When you use a scenario-based approach, you are normally testing several

require-

ments within the same scenario. Therefore, as well as checking individual

requirements, you are also checking that combinations of requirements do

not cause problems.

248 Chapter 8 Software testing

8.3.3 Performance testing

Once a system has been completely integrated, it is possible to test for

emergent

properties, such as performance and reliability. Performance tests have to

be

designed to ensure that the system can process its intended load. This

usually

involves running a series of tests where you increase the load until the

system performance becomes unacceptable.

As with other types of testing, performance testing is concerned both with

dem-

onstrating that the system meets its requirements and discovering

problems and

defects in the system. To test whether performance requirements are being

achieved,

you may have to construct an operational profile. An operational profile

(see Chapter 11) is a set of tests that reflect the actual mix of work that

will be handled by the system.

Therefore, if 90% of the transactions in a system are of type A, 5% of type

B, and the remainder of types C, D, and E, then you have to design the

operational profile so

that the vast majority of tests are of type A. Otherwise, you will not get an

accurate test of the operational performance of the system.

This approach, of course, is not necessarily the best approach for defect

testing.

Experience has shown that an effective way to discover defects is to design

tests

around the limits of the system. In performance testing, this means

stressing the system by making demands that are outside the design limits

of the software. This is

known as stress testing.

Say you are testing a transaction processing system that is designed to

process up

to 300 transactions per second. You start by testing this system with fewer

than

300 transactions per second. You then gradually increase the load on the

system

beyond 300 transactions per second until it is well beyond the maximum

design load

of the system and the system fails.

Stress testing helps you do two things:

1. Test the failure behavior of the system. Circumstances may arise

through an

unexpected combination of events where the load placed on the system

exceeds

the maximum anticipated load. In these circumstances, system failure

should

not cause data corruption or unexpected loss of user services. Stress

testing

checks that overloading the system causes it to “fail-soft” rather than

collapse

under its load.

2. Reveal defects that only show up when the system is fully loaded.

Although it can be argued that these defects are unlikely to cause system

failures in normal use, there may be unusual combinations of

circumstances that the stress testing replicates.

Stress testing is particularly relevant to distributed systems based on a

network of processors. These systems often exhibit severe degradation

when they are heavily

loaded. The network becomes swamped with coordination data that the

different

processes must exchange. The processes become slower and slower as they

wait for

the required data from other processes. Stress testing helps you discover

when the

degradation begins so that you can add checks to the system to reject

transactions

beyond this point.

8.4 User testing 249

8.4 User testing

User or customer testing is a stage in the testing process in which users or

customers provide input and advice on system testing. This may involve

formally testing a system that has been commissioned from an external

supplier. Alternatively, it may be

an informal process where users experiment with a new software product

to see if

they like it and to check that it does what they need. User testing is

essential, even when comprehensive system and release testing have been

carried out. Influences

from the user’s working environment can have a major effect on the

reliability, per-

formance, usability, and robustness of a system.

It is practically impossible for a system developer to replicate the system’s

work-

ing environment, as tests in the developer’s environment are inevitably

artificial. For example, a system that is intended for use in a hospital is

used in a clinical environment where other things are going on, such as

patient emergencies and conversations

with relatives. These all affect the use of a system, but developers cannot

include

them in their testing environment.

There are three different types of user testing:

1. Alpha testing, where a selected group of software users work closely

with the development team to test early releases of the software.

2. Beta testing, where a release of the software is made available to a larger

group of users to allow them to experiment and to raise problems that

they discover

with the system developers.

3. Acceptance testing, where customers test a system to decide whether or

not it is ready to be accepted from the system developers and deployed in

the customer environment.

In alpha testing, users and developers work together to test a system as it

is being developed. This means that the users can identify problems and

issues that are not

readily apparent to the development testing team. Developers can only

really work

from the requirements, but these often do not reflect other factors that

affect the

practical use of the software. Users can therefore provide information

about practice that helps with the design of more realistic tests.

Alpha testing is often used when developing software products or apps.

Experienced

users of these products may be willing to get involved in the alpha testing

process

because this gives them early information about new system features that

they can

exploit. It also reduces the risk that unanticipated changes to the software

will have disruptive effects on their business. However, alpha testing may

also be used when

custom software is being developed. Agile development methods advocate

user

involvement in the development process, and that users should play a key

role in

designing tests for the system.

Beta testing takes place when an early, sometimes unfinished, release of a

software system is made available to a larger group of customers and users

for evaluation.

Beta testers may be a selected group of customers who are early adopters

of the system.

250 Chapter 8 Software testing

Test

Test

Testing

Tests

Test

criteria

plan

results

report

Define

Plan

Derive

Run

Negotiate

Accept or

acceptance

acceptance

acceptance

acceptance

test results

reject

criteria

testing

tests

tests

system

Figure 8.11 The

Alternatively, the software may be made publicly available for use by

anyone who is

acceptance testing

interested in experimenting with it.

process

Beta testing is mostly used for software products that are used in many

different

settings. This is important as, unlike custom product developers, there is

no way for the product developer to limit the software’s operating

environment. It is impossible for product developers to know and replicate

all the settings in which the software

product will be used. Beta testing is therefore used to discover interaction

problems between the software and features of its operational

environment. Beta testing is also a form of marketing. Customers learn

about their system and what it can do for them.

Acceptance testing is an inherent part of custom systems development.

Customers

test a system, using their own data, and decide if it should be accepted

from the system developer. Acceptance implies that final payment should

be made for the software.

Figure 8.11 shows that here are six stages in the acceptance testing

process:

1. Define acceptance criteria This stage should ideally take place early in

the process before the contract for the system is signed. The acceptance

criteria should

be part of the system contract and be approved by the customer and the

devel-

oper. In practice, however, it can be difficult to define criteria so early in

the

process. Detailed requirements may not be available, and the requirements

will

almost certainly change during the development process.

2. Plan acceptance testing This stage involves deciding on the resources,

time, and budget for acceptance testing and establishing a testing

schedule. The acceptance test plan should also discuss the required

coverage of the requirements and

the order in which system features are tested. It should define risks to the

testing process such as system crashes and inadequate performance, and

discuss how

these risks can be mitigated.

3. Derive acceptance tests Once acceptance criteria have been established,

tests have to be designed to check whether or not a system is acceptable.

Acceptance

tests should aim to test both the functional and non-functional

characteristics

(e.g., performance) of the system. They should ideally provide complete

cover-

age of the system requirements. In practice, it is difficult to establish

completely objective acceptance criteria. There is often scope for

argument about whether

or not a test shows that a criterion has definitely been met.

4. Run acceptance tests The agreed acceptance tests are executed on the

system.

Ideally, this step should take place in the actual environment where the

system

will be used, but this may be disruptive and impractical. Therefore, a user

testing

8.4 User testing 251

environment may have to be set up to run these tests. It is difficult to

automate

this process as part of the acceptance tests may involve testing the

interactions

between end-users and the system. Some training of end-users may be

required.

5. Negotiate test results It is very unlikely that all of the defined acceptance

tests will pass and that there will be no problems with the system. If this is

the case,

then acceptance testing is complete and the system can be handed over.

More

commonly, some problems will be discovered. In such cases, the developer

and

the customer have to negotiate to decide if the system is good enough to

be used.

They must also agree on how the developer will fix the identified

problems.

6. Reject/accept system This stage involves a meeting between the

developers and the customer to decide on whether or not the system

should be accepted. If the

system is not good enough for use, then further development is required to

fix

the identified problems. Once complete, the acceptance testing phase is

repeated.

You might think that acceptance testing is a clear-cut contractual issue. If

a system does not pass its acceptance tests, then it should not be accepted

and payment should not be made. However, the reality is more complex.

Customers want to use the software as soon as they can because of the

benefits of its immediate deployment. They

may have bought new hardware, trained staff, and changed their

processes. They may

be willing to accept the software, irrespective of problems, because the

costs of not using the software are greater than the costs of working

around the problems.

Therefore, the outcome of negotiations may be conditional acceptance of

the sys-

tem. The customer may accept the system so that deployment can begin.

The system

provider agrees to repair urgent problems and deliver a new version to the

customer

as quickly as possible.

In agile methods such as Extreme Programming, there may be no separate

accept-

ance testing activity. The end-user is part of the development team (i.e.,

he or she is an alpha tester) and provides the system requirements in

terms of user stories. He or she is also responsible for defining the tests,

which decide whether or not the developed software supports the user

stories. These tests are therefore equivalent to

acceptance tests. The tests are automated, and development does not

proceed until

the story acceptance tests have successfully been executed.

When users are embedded in a software development team, they should

ideally be

“typical” users with general knowledge of how the system will be used.

However, it

can be difficult to find such users, and so the acceptance tests may

actually not be a true reflection of how a system is used in practice.

Furthermore, the requirement for automated testing limits the flexibility

of testing interactive systems. For such systems, acceptance testing may

require groups of end-users to use the system as if it

was part of their everyday work. Therefore, while an “embedded user” is

an attrac-

tive notion in principle, it does not necessarily lead to high-quality tests of

the system.

The problem of user involvement in agile teams is one reason why many

compa-

nies use a mix of agile and more traditional testing. The system may be

developed

using agile techniques, but, after completion of a major release, separate

acceptance testing is used to decide if the system should be accepted.

252 Chapter 8 Software testing

K e y P o i n t s

Testing can only show the presence of errors in a program. It cannot

show that there are no remaining faults.

Development testing is the responsibility of the software development

team. A separate team should be responsible for testing a system before it

is released to customers. In the user testing process, customers or system

users provide test data and check that tests are successful.

Development testing includes unit testing in which you test individual

objects and methods; component testing in which you test related groups

of objects; and system testing in which you test partial or complete

systems.

When testing software, you should try to “break” the software by using

experience and guidelines to choose types of test cases that have been

effective in discovering defects in other systems.

Wherever possible, you should write automated tests. The tests are

embedded in a program that can be run every time a change is made to a

system.

Test-first development is an approach to development whereby tests are

written before the code to be tested. Small code changes are made, and

the code is refactored until all tests execute successfully.

Scenario testing is useful because it replicates the practical use of the

system. It involves inventing a typical usage scenario and using this to

derive test cases.

Acceptance testing is a user testing process in which the aim is to

decide if the software is good enough to be deployed and used in its

planned operational environment.

F U R T h e R R e a D i n g

“How to design practical test cases.” A how-to article on test-case design

by an author from a Japanese company that has a good reputation for

delivering software with very few faults.

(T. Yamaura, IEEE Software, 15(6), November 1998) http://

dx.doi.org/10.1109/52.730835.

“Test-driven development.” This special issue on test-driven development

includes a good general overview of TDD as well as experience papers on

how TDD has been used for different types of software. ( IEEE Software, 24

(3) May/June 2007).

Exploratory Software Testing. This is a practical, rather than theoretical,

book on software testing which develops the ideas in Whittaker’s earlier

book, How to Break Software. The author presents a set of experience-based

guidelines on software testing. (J. A. Whittaker, 2009, Addison-Wesley).

How Google Tests Software. This is a book about testing large-scale cloud-

based systems and poses a whole set of new challenges compared to

custom software applications. While I don’t think that the Google

approach can be used directly, there are interesting lessons in this book

for large-scale system testing. (J. Whittaker, J. Arbon, and J. Carollo,

2012, Addison-Wesley).

Chapter 8 Software

Chapter 8 testing

Exercises 253

W e b s i T e

PowerPoint slides for this chapter:

www.pearsonglobaleditions.com/Sommerville

Links to supporting videos:

http://software-engineering-book.com/videos/implementation-and-

evolution/

e x e R C i s e s

8.1. Explain how the number of known defects remaining in a program at

the time of delivery affects product support.

8.2. Testing is meant to show that a program does what it is intended to

do. Why may testers not always know what a program is intended for?

8.3. Some people argue that developers should not be involved in testing

their own code but that all testing should be the responsibility of a

separate team. Give arguments for and against testing by the developers

themselves.

8.4. You have been asked to test a method called catWhiteSpace in a

“Paragraph” object that, within the paragraph, replaces sequences of blank

characters with a single blank character. Identify testing partitions for this

example and derive a set of tests for the catWhiteSpace method.

8.5. What is regression testing? Explain how the use of automated tests

and a testing framework such as JUnit simplifies regression testing.

8.6. The Mentcare system is constructed by adapting an off-the-shelf

information system. What do you think are the differences between testing

such a system and testing software that is developed using an object-

oriented language such as Java?

8.7. Write a scenario that could be used to help design tests for the

wilderness weather station system.

8.8. What do you understand by the term stress testing? Suggest how you

might stress-test the Mentcare system.

8.9. What are the benefits of involving users in release testing at an early

stage in the testing process? Are there disadvantages in user involvement?

8.10. A common approach to system testing is to test the more important

functionalities of a system first, followed by the less important

functionalities until the testing budget is exhausted. Discuss the ethics

involved in identifying what “more important” means.

254 Chapter 8 Software testing

R e F e R e n C e s

Andrea, J. 2007. “Envisioning the Next Generation of Functional Testing

Tools.” IEEE Software 24 (3): 58–65. doi:10.1109/MS.2007.73.

Beck, K. 2002. Test Driven Development: By Example. Boston: Addison-

Wesley.

Bezier, B. 1990. Software Testing Techniques, 2nd ed. New York: Van

Nostrand Reinhold.

Boehm, B. W. 1979. “Software Engineering; R & D Trends and Defense

Needs.” In Research Directions in Software Technology, edited by P. Wegner,

1–9. Cambridge, MA: MIT Press.

Cusamano, M., and R. W. Selby. 1998. Microsoft Secrets. New York: Simon

& Schuster.

Dijkstra, E. W. 1972. “The Humble Programmer.” Comm. ACM 15 (10):

859–866.

doi:10.1145/355604.361591.

Fagan, M. E. 1976. “Design and Code Inspections to Reduce Errors in

Program Development.” IBM

Systems J. 15 (3): 182–211.

Jeffries, R., and G. Melnik. 2007. “TDD: The Art of Fearless

Programming.” IEEE Software 24: 24–30.

doi:10.1109/MS.2007.75.

Kaner, C. 2003. “An Introduction to Scenario Testing.” Software Testing

and Quality Engineering (October 2003).

Lutz, R. R. 1993. “Analysing Software Requirements Errors in Safety-

Critical Embedded Systems.” In RE’93, 126–133. San Diego CA: IEEE.

doi:0.1109/ISRE.1993.324825.

Martin, R. C. 2007. “Professionalism and Test-Driven Development.” IEEE

Software 24 (3): 32–36.

doi:10.1109/MS.2007.85.

Prowell, S. J., C. J. Trammell, R. C. Linger, and J. H. Poore. 1999.

Cleanroom Software Engineering: Technology and Process. Reading, MA:

Addison-Wesley.

Tahchiev, P., F. Leme, V. Massol, and G. Gregory. 2010. JUnit in Action,

2nd ed. Greenwich, CT: Manning Publications.

Whittaker, J. A. 2009. Exploratory Software Testing. Boston: Addison-

Wesley.

9

Software evolution

Objectives

The objectives of this chapter are to explain why software evolution is

such an important part of software engineering and to describe the

challenges of maintaining a large base of software systems, developed

over many years. When you have read this chapter, you will:

understand that software systems have to adapt and evolve if they are

to remain useful and that software change and evolution should be

considered as an integral part of software engineering;

understand what is meant by legacy systems and why these systems

are important to businesses;

understand how legacy systems can be assessed to decide whether

they should be scrapped, maintained, reengineered, or replaced;

have learned about different types of software maintenance and the

factors that affect the costs of making changes to legacy software

systems.

Contents

9.1 Evolution processes

9.2 Legacy systems

9.3 Software maintenance

256 Chapter 9 Software evolution

Large software systems usually have a long lifetime. For example, military

or infra-

structure systems, such as air traffic control systems, may have a lifetime

of 30 years or more. Business systems are often more than 10 years old.

Enterprise software costs a lot of money, so a company has to use a

software system for many years to get a return on its investment.

Successful software products and apps may have been introduced many

years ago with new versions released every few years. For example, the

first version of Microsoft Word was introduced in 1983, so it has been

around for more than 30 years.

During their lifetime, operational software systems have to change if they

are

to emain useful. Business changes and changes to user expectations

generate new

requirements for the software. Parts of the software may have to be

modified to cor-

rect errors that are found in operation, to adapt it for changes to its

hardware and software platform, and to improve its performance or other

non-functional characteristics. Software products and apps have to evolve

to cope with platform changes and

new features introduced by their competitors. Software systems, therefore,

adapt and evolve during their lifetime from initial deployment to final

retirement.

Businesses have to change their software to ensure that they continue to

get value

from it. Their systems are critical business assets, and they have to invest

in change to maintain the value of these assets. Consequently, most large

companies spend more

on maintaining existing systems than on new systems development.

Historical data

suggests that somewhere between 60% and 90% of software costs are

evolution costs

(Lientz and Swanson 1980; Erlikh 2000). Jones (Jones 2006) found that

about 75% of

development staff in the United States in 2006 were involved in software

evolution

and suggested that this percentage was unlikely to fall in the foreseeable

future.

Software evolution is particularly expensive in enterprise systems when

individ-

ual software systems are part of a broader “system of systems.” In such

cases, you

cannot just consider the changes to one system; you also need to examine

how these

changes affect the broader system of systems. Changing one system may

mean that

other systems in its environment may also have to evolve to cope with

that change.

Therefore, as well as understanding and analyzing the impact of a

proposed

change on the system itself, you also have to assess how this change may

affect other systems in the operational environment. Hopkins and Jenkins

(Hopkins and Jenkins

2008) have coined the term brownfield software development to describe

situations in which software systems have to be developed and managed

in an environment

where they are dependent on other software systems.

The requirements of installed software systems change as the business and

its

environment change, so new releases of the systems that incorporate

changes and

updates are usually created at regular intervals. Software engineering is

therefore a spiral process with requirements, design, implementation, and

testing going on

throughout the lifetime of the system (Figure 9.1). You start by creating

release 1 of the system. Once delivered, changes are proposed, and the

development of release 2

starts almost immediately. In fact, the need for evolution may become

obvious even

before the system is deployed, so later releases of the software may start

develop-

ment before the current version has even been released.

In the last 10 years, the time between iterations of the spiral has reduced

dramati-

cally. Before the widespread use of the Internet, new versions of a

software system

Chapter 9 Software evolution 257

Specification

Implemention

Start

etc.

Release 1

Operation

Validation

Release 2

Figure 9.1 A spiral

Release 3

model of development

and evolution

may only have been released every 2 or 3 years. Now, because of

competitive pres-

sures and the need to respond quickly to user feedback, the gap between

releases of

some apps and web-based systems may be weeks rather than years.

This model of software evolution is applicable when the same company is

respon-

sible for the software throughout its lifetime. There is a seamless transition

from

development to evolution, and the same software development methods

and pro-

cesses are applied throughout the lifetime of the software. Software

products and

apps are developed using this approach.

The evolution of custom software, however, usually follows a different

model.

The system customer may pay a software company to develop the

software and

then take over responsibility for support and evolution using its own staff.

Alternatively, the software customer might issue a separate contract to a

different

software company for system support and evolution.

In this situation, there are likely to be discontinuities in the evolution

process.

Requirements and design documents may not be passed from one

company to

another. Companies may merge or reorganize, inherit software from other

compa-

nies, and then find that this has to be changed. When the transition from

develop-

ment to evolution is not seamless, the process of changing the software

after delivery is called software maintenance. As I discuss later in this

chapter, maintenance

involves extra process activities, such as program understanding, in

addition to the normal activities of software development.

Rajlich and Bennett (Rajlich and Bennett 2000) propose an alternative

view of

the software evolution life cycle for business systems. In this model, they

distinguish between evolution and servicing. Evolution is the phase in

which significant changes to the software architecture and functionality

are made. During servicing, the only

changes that are made are relatively small but essential changes. These

phases over-

lap with each other, as shown in Figure 9.2.

According to Rajlich and Bennett, when software is first used successfully,

many

changes to the requirements by stakeholders are proposed and

implemented. This is

258 Chapter 9 Software evolution

Software

development

Software

evolution

Software

servicing

Software

retirement

Figure 9.2 Evolution

and servicing

Time

the evolution phase. However, as the software is modified, its structure

tends to

degrade, and system changes become more and more expensive. This often

happens

after a few years of use when other environmental changes, such as

hardware and

operating systems, are also required. At some stage in the life cycle, the

software

reaches a transition point where significant changes and the

implementation of new

requirements become less and less cost-effective. At this stage, the

software moves

from evolution to servicing.

During the servicing phase, the software is still useful, but only small

tactical

changes are made to it. During this stage, the company is usually

considering how the software can be replaced. In the final stage, the

software may still be used, but only essential changes are made. Users

have to work around problems that they discover.

Eventually, the software is retired and taken out of use. This often incurs

further costs as data is transferred from an old system to a newer

replacement system.

9.1 Evolution processes

As with all software processes, there is no such thing as a standard

software change or evolution process. The most appropriate evolution

process for a software system

depends on the type of software being maintained, the software

development pro-

cesses used in an organization, and the skills of the people involved. For

some types of system, such as mobile apps, evolution may be an informal

process, where change

requests mostly come from conversations between system users and

developers. For

other types of systems, such as embedded critical systems, software

evolution may be formalized, with structured documentation produced at

each stage in the process.

Formal or informal system change proposals are the driver for system

evolution in all organizations. In a change proposal, an individual or

group suggests changes and updates to an existing software system. These

proposals may be based on existing requirements that have not been

implemented in the released system, requests for new requirements, bug

reports from system stakeholders, and new ideas for software

improvement from the system development team. The processes of change

identification and system evolution are cyclical and continue throughout

the lifetime of a system (Figure 9.3).

Before a change proposal is accepted, there needs to be an analysis of the

software to work out which components need to be changed. This analysis

allows

the cost and the impact of the change to be assessed. This is part of the

general process of change management, which should also ensure that the

correct versions of

9.1 Evolution processes 259

Change identification

process

New system

Change proposals

Figure 9.3 Change

Software evolution

identification and

process

evolution processes

components are included in each system release. I discuss change and

configuration

management in Chapter 25.

Figure 9.4 shows some of the activities involved in software evolution.

The pro-

cess includes the fundamental activities of change analysis, release

planning, system implementation, and releasing a system to customers.

The cost and impact of these

changes are assessed to see how much of the system is affected by the

change and

how much it might cost to implement the change.

If the proposed changes are accepted, a new release of the system is

planned.

During release planning, all proposed changes (fault repair, adaptation,

and new

functionality) are considered. A decision is then made on which changes

to imple-

ment in the next version of the system. The changes are implemented and

validated,

and a new version of the system is released. The process then iterates with

a new set of changes proposed for the next release.

In situations where development and evolution are integrated, change

implemen-

tation is simply an iteration of the development process. Revisions to the

system are designed, implemented, and tested. The only difference

between initial development

and evolution is that customer feedback after delivery has to be

considered when

planning new releases of an application.

Figure 9.4 A general

Where different teams are involved, a critical difference between

development and

model of the software

evolution process

evolution is that the first stage of change implementation requires

program understanding.

Change

Impact

Release

Change

System

requests

analysis

planning

implementation

release

Platform

System

Fault repair

adaptation

enhancement

260 Chapter 9 Software evolution

Proposed

Requirements

Requirements

Software

Figure 9.5 Change

changes

analysis

updating

development

implementation

During the program understanding phase, new developers have to

understand how the

program is structured, how it delivers functionality, and how the proposed

change might affect the program. They need this understanding to make

sure that the implemented

change does not cause new problems when it is introduced into the

existing system.

If requirements specification and design documents are available, these

should be

updated during the evolution process to reflect the changes that are

required (Figure 9.5).

New software requirements should be written, and these should be

analyzed and

validated. If the design has been documented using UML models, these

models

should be updated. The proposed changes may be prototyped as part of

the change

analysis process, where you assess the implications and costs of making

the change.

However, change requests sometimes relate to problems in operational

systems

that have to be tackled urgently. These urgent changes can arise for three

reasons:

1. If a serious system fault is detected that has to be repaired to allow

normal

operation to continue or to address a serious security vulnerability.

2. If changes to the systems operating environment have unexpected

effects that

disrupt normal operation.

3. If there are unanticipated changes to the business running the system,

such as

the emergence of new competitors or the introduction of new legislation

that

affects the system.

In these cases, the need to make the change quickly means that you may

not be able

to update all of the software documentation. Rather than modify the

requirements and design, you make an emergency fix to the program to

solve the immediate problem

(Figure 9.6). The danger here is that the requirements, the software

design, and the code can become inconsistent. While you may intend to

document the change in the

requirements and design, additional emergency fixes to the software may

then be

needed. These take priority over documentation. Eventually, the original

change is

forgotten, and the system documentation and code are never realigned.

This problem

of maintaining multiple representations of a system is one of the

arguments for minimal documentation, which is fundamental to agile

development processes.

Emergency system repairs have to be completed as quickly as possible.

You

choose a quick and workable solution rather than the best solution as far

as system

structure is concerned. This tends to accelerate the process of software

ageing so that future changes become progressively more difficult and

maintenance costs increase.

Ideally, after emergency code repairs are made, the new code should be

refactored

Figure 9.6 The

Change

Analyze

Modify

Deliver modified

emergency repair

requests

source code

source code

system

process

9.2 Legacy systems 261

and improved to avoid program degradation. Of course, the code of the

repair may

be reused if possible. However, an alternative, better solution to the

problem may be discovered when more time is available for analysis.

Agile methods and processes, discussed in Chapter 3, may be used for

program

evolution as well as program development. Because these methods are

based on

incremental development, making the transition from agile development

to postde-

livery evolution should be seamless.

However, problems may arise during the handover from a development

team to a

separate team responsible for system evolution. There are two potentially

problem-

atic situations:

1. Where the development team has used an agile approach but the

evolution team

prefers a plan-based approach. The evolution team may expect detailed

docu-

mentation to support evolution, and this is rarely produced in agile

processes.

There may be no definitive statement of the system requirements that can

be

modified as changes are made to the system.

2. Where a plan-based approach has been used for development but the

evolution

team prefers to use agile methods. In this case, the evolution team may

have to

start from scratch developing automated tests. The code in the system may

not

have been refactored and simplified, as is expected in agile development.

In this

case, some program reengineering may be required to improve the code

before

it can be used in an agile development process.

Agile techniques such as test-driven development and automated

regression test-

ing are useful when system changes are made. System changes may be

expressed as

user stories, and customer involvement can help prioritize changes that

are required in an operational system. The Scrum approach of focusing on

a backlog of work to

be done can help prioritize the most important system changes. In short,

evolution

simply involves continuing the agile development process.

Agile methods used in development may, however, have to be modified

when

they are used for program maintenance and evolution. It may be

practically impossible to involve users in the development team as change

proposals come from a wide

range of stakeholders. Short development cycles may have to be

interrupted to deal

with emergency repairs, and the gap between releases may have to be

lengthened to

avoid disrupting operational processes.

9.2 Legacy systems

Large companies started computerizing their operations in the 1960s, so

for the past 50

years or so, more and more software systems have been introduced. Many

of these

systems have been replaced (sometimes several times) as the business has

changed and evolved. However, a lot of old systems are still in use and

play a critical part in the running of the business. These older software

systems are sometimes called legacy systems.

262 Chapter 9 Software evolution

Legacy systems are older systems that rely on languages and technology

that are

no longer used for new systems development. Typically, they have been

maintained

over a long period, and their structure may have been degraded by the

changes that

have been made. Legacy software may be dependent on older hardware,

such as

mainframe computers and may have associated legacy processes and

procedures. It

may be impossible to change to more effective business processes because

the leg-

acy software cannot be modified to support new processes.

Legacy systems are not just software systems but are broader

sociotechnical systems

that include hardware, software, libraries, and other supporting software

and business processes. Figure 9.7 shows the logical parts of a legacy

system and their relationships.

1. System hardware Legacy systems may have been written for hardware

that is no longer available, that is expensive to maintain, and that may not

be compatible

with current organizational IT purchasing policies.

2. Support software The legacy system may rely on a range of support

software from the operating system and utilities provided by the hardware

manufacturer

through to the compilers used for system development. Again, these may

be

obsolete and no longer supported by their original providers.

3. Application software The application system that provides the business

services is usually made up of a number of application programs that have

been developed at different times. Some of these programs will also be

part of other appli-

cation software systems.

4. Application data These data are processed by the application system. In

many legacy systems, an immense volume of data has accumulated over

the lifetime

of the system. This data may be inconsistent, may be duplicated in several

files,

and may be spread over a number of different databases.

5. Business processes These processes are used in the business to achieve

some business objective. An example of a business process in an insurance

company

would be issuing an insurance policy; in a manufacturing company, a

business

process would be accepting an order for products and setting up the

associated

manufacturing process. Business processes may be designed around a

legacy

system and constrained by the functionality that it provides.

6. Business policies and rules These are definitions of how the business

should be carried out and constraints on the business. Use of the legacy

application system

may be embedded in these policies and rules.

An alternative way of looking at these components of a legacy system is as

a

series of layers, as shown in Figure 9.8.

Each layer depends on the layer immediately below it and interfaces with

that

layer. If interfaces are maintained, then you should be able to make

changes within a layer without affecting either of the adjacent layers. In

practice, however, this simple encapsulation is an oversimplification, and

changes to one layer of the system may

9.2 Legacy systems 263

Embeds

knowledge of

Uses

Support

Application

Business policies

software

software

and rules

Runs-on

Uses

Constrains

Runs-on

Uses

System

Application

Business

Figure 9.7 The elements

hardware

data

processes

of a legacy system

require consequent changes to layers that are both above and below the

changed

level. The reasons for this are as follows:

1. Changing one layer in the system may introduce new facilities, and

higher

layers in the system may then be changed to take advantage of these

facilities.

For example, a new database introduced at the support software layer may

include facilities to access the data through a web browser, and business

processes may be modified to take advantage of this facility.

2. Changing the software may slow the system down so that new

hardware is

needed to improve the system performance. The increase in performance

from

the new hardware may then mean that further software changes that were

previously impractical become possible.

3. It is often impossible to maintain hardware interfaces, especially if new

hard-

ware is introduced. This is a particular problem in embedded systems

where

there is a tight coupling between software and hardware. Major changes

to the

application software may be required to make effective use of the new

hardware.

It is difficult to know exactly how much legacy code is still in use, but, as

an indicator, industry has estimated that there are more than 200 billion

lines of COBOL

code in current business systems. COBOL is a programming language

designed for

writing business systems, and it was the main business development

language from

the 1960s to the 1990s, particularly in the finance industry (Mitchell

2012). These

programs still work effectively and efficiently, and the companies using

them see no need to change them. A major problem that they face,

however, is a shortage of

COBOL programmers as the original developers of the system retire.

Universities no

longer teach COBOL, and younger software engineers are more interested

in pro-

gramming in modern languages.

Skill shortages are only one of the problems of maintaining business

legacy sys-

tems. Other issues include security vulnerabilities because these systems

were

developed before the widespread use of the Internet and problems in

interfacing

with systems written in modern programming languages. The original

software tool

supplier may be out of business or may no longer maintain the support

tools used to

264 Chapter 9 Software evolution

Socio-technical system

Business processes

Application software

Platform and infrastructure software

Figure 9.8 Legacy

Hardware

system layers

develop the system. The system hardware may be obsolete and so

increasingly

expensive to maintain.

Why then do businesses not simply replace these systems with more

modern

equivalents? The simple answer to this question is that it is too expensive

and too

risky to do so. If a legacy system works effectively, the costs of

replacement may

exceed the savings that come from the reduced support costs of a new

system.

Scrapping legacy systems and replacing them with more modern software

open up

the possibility of things going wrong and the new system failing to meet

the needs

of the business. Managers try to minimize those risks and therefore do not

want to

face the uncertainties of new software systems.

I discovered some of the problems of legacy system replacement when I

was

involved in analyzing a legacy system replacement project in a large

organization.

This enterprise used more than 150 legacy systems to run its business. It

decided to replace all of these systems with a single, centrally maintained

ERP system. For a

number of business and technology reasons, the new system development

was a

failure, and it did not deliver the improvements promised. After spending

more than

£10 million, only a part of the new system was operational, and it worked

less effectively than the systems it replaced. Users continued to use the

older systems but

could not integrate these with the part of the new system that had been

implemented, so additional manual processing was required.

There are several reasons why it is expensive and risky to replace legacy

systems

with new systems:

1. There is rarely a complete specification of the legacy system. The

original specification may have been lost. If a specification exists, it is

unlikely that it has

been updated with all of the system changes that have been made.

Therefore,

there is no straightforward way of specifying a new system that is

functionally

identical to the system that is in use.

2. Business processes and the ways in which legacy systems operate are

often inex-

tricably intertwined. These processes are likely to have evolved to take

advantage

of the software’s services and to work around the software’s shortcomings.

If the

system is replaced, these processes have to change with potentially

unpredictable

costs and consequences.

9.2 Legacy systems 265

3. Important business rules may be embedded in the software and may not

be doc-

umented elsewhere. A business rule is a constraint that applies to some

business

function, and breaking that constraint can have unpredictable

consequences for

the business. For example, an insurance company may have embedded its

rules

for assessing the risk of a policy application in its software. If these rules

are not maintained, the company may accept high-risk policies that could

result in

expensive future claims.

4. New software development is inherently risky, so that there may be

unexpected problems with a new system. It may not be delivered on time

and for the price expected.

Keeping legacy systems in use avoids the risks of replacement, but making

changes to existing software inevitably becomes more expensive as

systems get

older. Legacy software systems that are more than a few years old are

particularly

expensive to change:

1. The program style and usage conventions are inconsistent because

different

people have been responsible for system changes. This problem adds to

the dif-

ficulty of understanding the system code.

2. Part or all of the system may be implemented using obsolete

programming

languages. It may be difficult to find people who have knowledge of these

languages.

Expensive outsourcing of system maintenance may therefore be required.

3. System documentation is often inadequate and out of date. In some

cases, the

only documentation is the system source code.

4. Many years of maintenance usually degrades the system structure,

making it

increasingly difficult to understand. New programs may have been added

and

interfaced with other parts of the system in an ad hoc way.

5. The system may have been optimized for space utilization or execution

speed so that it runs effectively on older slower hardware. This normally

involves using specific machine and language optimizations, and these

usu-

ally lead to software that is hard to understand. This causes problems for

programmers who have learned modern software engineering techniques

and

who don’t understand the programming tricks that have been used to opti-

mize the software.

6. The data processed by the system may be maintained in different files

that have

incompatible structures. There may be data duplication, and the data itself

may

be out of date, inaccurate, and incomplete. Several databases from

different sup-

pliers may be used.

At same stage, the costs of managing and maintaining the legacy system

become

so high that it has to be replaced with a new system. In the next section, I

discuss a systematic decision-making approach to making such a

replacement decision.

266 Chapter 9 Software evolution

9.2.1 Legacy system management

For new software systems developed using modern software engineering

processes,

such as agile development and software product lines, it is possible to plan

how to

integrate system development and evolution. More and more companies

understand

that the system development process is a whole life-cycle process.

Separating soft-

ware development and software evolution is unhelpful and leads to higher

costs.

However, as I have discussed, there is still a huge number of legacy

systems that are critical business systems. These have to be extended and

adapted to changing

e-business practices.

Most organizations have a limited budget for maintaining and upgrading

their

portfolio of legacy systems. They have to decide how to get the best return

on their investment. This involves making a realistic assessment of their

legacy systems and

then deciding on the most appropriate strategy for evolving these systems.

There are four strategic options:

1. Scrap the system completely This option should be chosen when the

system is not making an effective contribution to business processes. This

usually occurs

when business processes have changed since the system was installed and

are

no longer reliant on the legacy system.

2. Leave the system unchanged and continue with regular maintenance This

option should be chosen when the system is still required but is fairly

stable and the

system users make relatively few change requests.

3. Reengineer the system to improve its maintainability This option should be

chosen when the system quality has been degraded by change and where

new change to

the system is still being proposed. This process may include developing

new inter-

face components so that the original system can work with other, newer

systems.

4. Replace all or part of the system with a new system This option should be

chosen when factors, such as new hardware, mean that the old system

cannot continue

in operation, or where off-the-shelf systems would allow the new system

to be

developed at a reasonable cost. In many cases, an evolutionary

replacement

strategy can be adopted where major system components are replaced by

off-

the-shelf systems with other components reused wherever possible.

When you are assessing a legacy system, you have to look at it from both a

busi-

ness perspective and a technical perspective (Warren 1998). From a

business

perspective, you have to decide whether or not the business really needs

the system.

From a technical perspective, you have to assess the quality of the

application software and the system’s support software and hardware. You

then use a combination

of the business value and the system quality to inform your decision on

what to do

with the legacy system.

For example, assume that an organization has 10 legacy systems. You

should

assess the quality and the business value of each of these systems. You

may then

create a chart showing relative business value and system quality. An

example of

9.2 Legacy systems 267

High business value

Low quality

High business value

High quality

9

10

6

8

7

Low business value

Low business value

Low quality

High quality

Business value

2

5

1

3

4

Figure 9.9 An example

of a legacy system

System quality

assessment

this is shown in Figure 9.9. From this diagram, you can see that there are

four

clusters of systems:

1. Low quality, low business value Keeping these systems in operation will

be expensive, and the rate of the return to the business will be fairly small.

These

systems should be scrapped.

2. Low quality, high business value These systems are making an important

business contribution, so they cannot be scrapped. However, their low

quality means that

they are expensive to maintain. These systems should be reengineered to

improve

their quality. They may be replaced, if suitable off-the-shelf systems are

available.

3. High quality, low business value These systems don’t contribute much to

the business but may not be very expensive to maintain. It is not worth

replacing

these systems, so normal system maintenance may be continued if

expensive

changes are not required and the system hardware remains in use. If

expensive

changes become necessary, the software should be scrapped.

4. High quality, high business value These systems have to be kept in

operation.

However, their high quality means that you don’t have to invest in

transforma-

tion or system replacement. Normal system maintenance should be

continued.

The business value of a system is a measure of how much time and effort

the

system saves compared to manual processes or the use of other systems.

To assess

the business value of a system, you have to identify system stakeholders,

such as the end-users of a system and their managers, and ask a series of

questions about the

system. There are four basic issues that you have to discuss:

1. The use of the system If a system is only used occasionally or by a small

number of people, this may mean that it has a low business value. A

legacy system may have

been developed to meet a business need that has either changed or can

now be met

268 Chapter 9 Software evolution

more effectively in other ways. You have to be careful, however, about

occasional

but important use of systems. For example, a university system for student

regis-

tration may only be used at the beginning of each academic year.

Although it is

used infrequently, it is an essential system with a high business value.

2. The business processes that are supported When a system is introduced,

business processes are usually introduced to exploit the system’s

capabilities. If the

system is inflexible, changing these business processes may be impossible.

However, as the environment changes, the original business processes may

become obsolete. Therefore, a system may have a low business value

because it

forces the use of inefficient business processes.

3. System dependability System dependability is not only a technical

problem but also a business problem. If a system is not dependable and

the problems directly

affect business customers, or mean that people in the business are diverted

from

other tasks to solve these problems, the system has a low business value.

4. The system outputs The key issue here is the importance of the system

outputs to the successful functioning of the business. If the business

depends on these outputs, then the system has a high business value.

Conversely, if these outputs can

be cheaply generated in some other way, or if the system produces outputs

that

are rarely used, then the system has a low business value.

For example, assume that a company provides a travel ordering system

that is

used by staff responsible for arranging travel. They can place orders with

an approved travel agent. Tickets are then delivered, and the company is

invoiced for them.

However, a business value assessment may reveal that this system is only

used for a

fairly small percentage of travel orders placed. People making travel

arrangements

find it cheaper and more convenient to deal directly with travel suppliers

through

their websites. This system may still be used, but there is no real point in

keeping it—the same functionality is available from external systems.

Conversely, say a company has developed a system that keeps track of all

previ-

ous customer orders and automatically generates reminders for customers

to reorder

goods. This results in a large number of repeat orders and keeps customers

satisfied because they feel that their supplier is aware of their needs. The

outputs from such a system are important to the business, so this system

has a high business value.

To assess a software system from a technical perspective, you need to

consider

both the application system itself and the environment in which the

system operates.

The environment includes the hardware and all associated support

software such as

compilers, debuggers and development environments that are needed to

maintain the

system. The environment is important because many system changes, such

as upgrades

to the hardware or operating system, result from changes to the

environment.

Factors that you should consider during the environment assessment are

shown in

Figure 9.10. Notice that these are not all technical characteristics of the

environment.

You also have to consider the reliability of the suppliers of the hardware

and support software. If suppliers are no longer in business, their systems

may not be supported, so you may have to replace these systems.

9.2 Legacy systems 269

Factor

Questions

Supplier stability

Is the supplier still in existence? Is the supplier financially stable and

likely to continue in existence? If the supplier is no longer in business,

does someone else

maintain the systems?

Failure rate

Does the hardware have a high rate of reported failures? Does the support

software

crash and force system restarts?

Age

How old is the hardware and software? The older the hardware and

support

software, the more obsolete it will be. It may still function correctly, but

there could be significant economic and business benefits to moving to a

more modern system.

Performance

Is the performance of the system adequate? Do performance problems

have a

significant effect on system users?

Support

What local support is required by the hardware and software? If high costs

are

requirements

associated with this support, it may be worth considering system

replacement.

Maintenance costs

What are the costs of hardware maintenance and support software

licences? Older

hardware may have higher maintenance costs than modern systems.

Support

software may have high annual licensing costs.

Interoperability

Are there problems interfacing the system to other systems? Can

compilers, for

example, be used with current versions of the operating system?

Figure 9.10 Factors

used in environment

assessment

In the process of environmental assessment, if possible, you should ideally

collect

data about the system and system changes. Examples of data that may be

useful include the costs of maintaining the system hardware and support

software, the number of

hardware faults that occur over some time period and the frequency of

patches and

fixes applied to the system support software.

To assess the technical quality of an application system, you have to assess

those

factors (Figure 9.11) that are primarily related to the system

dependability, the difficulties of maintaining the system, and the system

documentation. You may also

collect data that will help you judge the quality of the system such as:

1. The number of system change requests System changes usually corrupt the

system structure and make further changes more difficult. The higher this

accumulated

value, the lower the quality of the system.

2. The number of user interfaces This is an important factor in forms-based

systems where each form can be considered as a separate user interface.

The more

interfaces, the more likely it is that there will be inconsistencies and

redundan-

cies in these interfaces.

3. The volume of data used by the system As the volume of data (number of

files, size of database, etc.) processed by the system increases, so too do

the inconsistencies and errors in that data. When data has been collected

over a long period

of time, errors and inconsistencies are inevitable. Cleaning up old data is a

very

expensive and time-consuming process.

270 Chapter 9 Software evolution

Factor

Questions

Understandability

How difficult is it to understand the source code of the current system?

How

complex are the control structures that are used? Do variables have

meaningful names that reflect their function?

Documentation

What system documentation is available? Is the documentation complete,

consistent, and current?

Data

Is there an explicit data model for the system? To what extent is data

duplicated

across files? Is the data used by the system up to date and consistent?

Performance

Is the performance of the application adequate? Do performance problems

have a significant effect on system users?

Programming language

Are modern compilers available for the programming language used to

develop the system? Is the programming language still used for new

system

development?

Configuration management

Are all versions of all parts of the system managed by a configuration

management system? Is there an explicit description of the versions of

components that are used in the current system?

Test data

Does test data for the system exist? Is there a record of regression tests

carried out when new features have been added to the system?

Personnel skills

Are there people available who have the skills to maintain the

application?

Are there people available who have experience with the system?

Figure 9.11 Factors

used in application

assessment

Ideally, objective assessment should be used to inform decisions about

what to do

with a legacy system. However, in many cases, decisions are not really

objective but are based on organizational or political considerations. For

example, if two businesses merge, the most politically powerful partner

will usually keep its systems and scrap the other company’s systems. If

senior management in an organization decides to

move to a new hardware platform, then this may require applications to

be

replaced. If no budget is available for system transformation in a

particular year,

then system maintenance may be continued, even though this will result

in higher

long-term costs.

9.3 Software maintenance

Software maintenance is the general process of changing a system after it

has

been delivered. The term is usually applied to custom software, where

separate

development groups are involved before and after delivery. The changes

made to

the software may be simple changes to correct coding errors, more

extensive

changes to correct design errors, or significant enhancements to correct

specifica-

tion errors or to accommodate new requirements. Changes are

implemented by

modifying existing system components and, where necessary, by adding

new

components to the system.

9.3 Software maintenance 271

Program evolution dynamics

Program evolution dynamics is the study of evolving software systems,

pioneered by Manny Lehman and Les Belady in the 1970s. This led to so-

called Lehman’s Laws, which are said to apply to all large-scale software

systems. The most important of these laws are:

1. A program must continually change if it is to remain useful.

2. As an evolving program changes, its structure is degraded.

3. Over a program’s lifetime, the rate of change is roughly constant and

independent of the resources available.

4. The incremental change in each release of a system is roughly constant.

5. New functionality must be added to systems to increase user

satisfaction.

http://software-engineering-book.com/web/program-evolution-dynamics/

There are three different types of software maintenance:

1. Fault repairs to fix bugs and vulnerabilities. Coding errors are usually

relatively cheap to correct; design errors are more expensive because they

may involve

rewriting several program components. Requirements errors are the most

expen-

sive to repair because extensive system redesign may be necessary.

2. Environmental adaptation to adapt the software to new platforms and

environ-

ments. This type of maintenance is required when some aspect of a

system’s

environment, such as the hardware, the platform operating system, or

other sup-

port software, changes. Application systems may have to be modified to

cope

with these environmental changes.

3. Functionality addition to add new features and to support new requirements.

This type of maintenance is necessary when system requirements change

in

response to organizational or business change. The scale of the changes

required

to the software is often much greater than for the other types of

maintenance.

In practice, there is no clear-cut distinction between these types of

maintenance.

When you adapt a system to a new environment, you may add

functionality to take

advantage of new environmental features. Software faults are often

exposed because

users use the system in unanticipated ways. Changing the system to

accommodate

their way of working is the best way to fix these faults.

These types of maintenance are generally recognized, but different people

some-

times give them different names. “Corrective maintenance” is universally

used to

refer to maintenance for fault repair. However, “adaptive maintenance”

sometimes

means adapting to a new environment and sometimes means adapting the

software to

new requirements. “Perfective maintenance” sometimes means perfecting

the soft-

ware by implementing new requirements; in other cases, it means

maintaining the

functionality of the system but improving its structure and its

performance. Because of this naming uncertainty, I have avoided the use

of these terms in this book.

272 Chapter 9 Software evolution

Fault repair

(24%)

Environmental

adaptation

Functionality addition

(19%)

or modification

(58%)

Figure 9.12

Maintenance effort

distribution

Figure 9.12 shows an approximate distribution of maintenance costs,

based on

data from the most recent survey available (Davidsen and Krogstie 2010).

This study

compared maintenance cost distribution with a number of earlier studies

from 1980

to 2005. The authors found that the distribution of maintenance costs had

changed

very little over 30 years. Although we don’t have more recent data, this

suggests that this distribution is still largely correct. Repairing system

faults is not the most expensive maintenance activity. Evolving the system

to cope with new environments and

new or changed requirements generally consumes most maintenance

effort.

Experience has shown that it is usually more expensive to add new

features to a

system during maintenance than it is to implement the same features

during initial

development. The reasons for this are:

1. A new team has to understand the program being maintained. After a

system has been delivered, it is normal for the development team to be

broken up and

for people to work on new projects. The new team or the individuals

responsible

for system maintenance do not understand the system or the background

to sys-

tem design decisions. They need to spend time understanding the existing

sys-

tem before they can implement changes to it.

2. Separating maintenance and development means there is no incentive for the

development team to write maintainable software. The contract to maintain a

system is usually separate from the system development contract. A

different

company, rather than the original software developer, may be responsible

for

software maintenance. In those circumstances, a development team gets

no ben-

efit from investing effort to make the software maintainable. If a

development

team can cut corners to save effort during development it is worthwhile

for them

to do so, even if this means that the software is more difficult to change in

future.

3. Program maintenance work is unpopular. Maintenance has a poor image

among software engineers. It is seen as a less skilled process than system

development

9.3 Software maintenance 273

Documentation

System documentation can help the maintenance process by providing

maintainers with information about the structure and organization of the

system and the features that it offers to system users. While proponents of

agile approaches suggest that the code should be the principal

documentation, higher level design models and information about

dependencies and constraints can make it easier to understand and make

changes to that code.

http://software-engineering-book.com/web/documentation/ (web

chapter)

and is often allocated to the least experienced staff. Furthermore, old

systems

may be written in obsolete programming languages. The developers

working on

maintenance may not have much experience of these languages and must

learn

these languages to maintain the system.

4. As programs age, their structure degrades and they become harder to change.

As changes are made to programs, their structure tends to degrade.

Consequently,

they become harder to understand and change. Some systems have been

developed

without modern software engineering techniques. They may never have

been

well structured and were perhaps optimized for efficiency rather than

understand-

ability. System documentation may be lost or inconsistent. Old systems

may not

have been subject to stringent configuration management, so developers

have to

spend time finding the right versions of system components to change.

The first three of these problems stem from the fact that many

organizations still

consider software development and maintenance to be separate activities.

Maintenance is seen as a second-class activity, and there is no incentive to

spend

money during development to reduce the costs of system change. The only

long-

term solution to this problem is to think of systems as evolving throughout

their

lifetime through a continual development process. Maintenance should

have as high

a status as new software development.

The fourth issue, the problem of degraded system structure, is, in some

ways, the

easiest problem to address. Software reengineering techniques (described

later in

this chapter) may be applied to improve the system structure and

understandability.

Architectural transformations can adapt the system to new hardware.

Refactoring

can improve the quality of the system code and make it easier to change.

In principle, it is almost always cost-effective to invest effort in designing

and

implementing a system to reduce the costs of future changes. Adding new

function-

ality after delivery is expensive because you have to spend time learning

the system and analyzing the impact of the proposed changes. Work done

during development

to structure the software and to make it easier to understand and change

will reduce evolution costs. Good software engineering techniques such as

precise specification, test-first development, the use of object-oriented

development, and configuration

management all help reduce maintenance cost.

These principled arguments for lifetime cost savings by investing in

making

systems more maintainable are, unfortunately, impossible to substantiate

with real

274 Chapter 9 Software evolution

data. Collecting data is expensive, and the value of that data is difficult to

judge; therefore, the vast majority of companies do not think it is

worthwhile to gather and analyze software engineering data.

In reality, most businesses are reluctant to spend more on software

develop-

ment to reduce longer-term maintenance costs. There are two main

reasons for

their reluctance:

1. Companies set out quarterly or annual spending plans, and managers

are incen-

tivized to reduce short-term costs. Investing in maintainability leads to

short-

term cost increases, which are measurable. However, the long-term gains

can’t

be measured at the same time, so companies are reluctant to spend money

on

something with an unknown future return.

2. Developers are not usually responsible for maintaining the system they

have

developed. Consequently, they don’t see the point of doing additional

work that

might reduce maintenance costs, as they will not get any benefit from it.

The only way around this problem is to integrate development and

maintenance

so that the original development team remains responsible for software

throughout

its lifetime. This is possible for software products and for companies such

as

Amazon, which develop and maintain their own software (O’Hanlon

2006).

However, for custom software developed by a software company for a

client, this is

unlikely to happen.

9.3.1 Maintenance prediction

Maintenance prediction is concerned with trying to assess the changes that

may be

required in a software system and with identifying those parts of the

system that are likely to be the most expensive to change. If you

understand this, you can design the software components that are most

likely to change to make them more adaptable.

You can also invest effort in improving those components to reduce their

lifetime

maintenance costs. By predicting changes, you can also assess the overall

mainte-

nance costs for a system in a given time period and so set a budget for

maintaining

the software. Figure 9.13 shows possible predictions and the questions

that these

predictions may answer.

Predicting the number of change requests for a system requires an

understanding

of the relationship between the system and its external environment. Some

systems

have a very complex relationship with their external environment, and

changes to

that environment inevitably result in changes to the system. To evaluate

the relationships between a system and its environment, you should look

at:

1. The number and complexity of system interfaces The larger the number of

interfaces and the more complex these interfaces, the more likely it is that

interface

changes will be required as new requirements are proposed.

9.3 Software maintenance 275

What parts of the system

will be the most expensive

What parts of the system are

to maintain?

most likely to be affected by

change requests?

Predicting

maintainability

What will be the lifetime

maintenance costs of this

Predicting system

Predicting

system?

changes

maintenance

costs

What will be the costs of

How many change

maintaining this system

requests can be

over the next year?

expected?

Figure 9.13

2. The number of inherently volatile system requirements As I discussed in

Chapter 4, Maintenance prediction

requirements that reflect organizational policies and procedures are likely

to be

more volatile than requirements that are based on stable domain

characteristics.

3. The business processes in which the system is used As business processes

evolve, they generate system change requests. As a system is integrated

with

more and more business processes, there are increased demands for

changes.

In early work on software maintenance, researchers looked at the

relationships

between program complexity and maintainability (Banker et al. 1993;

Coleman et al.

1994; Kozlov et al. 2008). These studies found that the more complex a

system or

component, the more expensive it is to maintain. Complexity

measurements are par-

ticularly useful in identifying program components that are likely to be

expensive to maintain. Therefore, to reduce maintenance costs you should

try to replace complex

system components with simpler alternatives.

After a system has been put into service, you may be able to use process

data to

help predict maintainability. Examples of process metrics that can be used

for assessing maintainability are:

1. Number of requests for corrective maintenance An increase in the number

of bug and failure reports may indicate that more errors are being

introduced into

the program than are being repaired during the maintenance process. This

may

indicate a decline in maintainability.

2. Average time required for impact analysis This is related to the number of

program components that are affected by the change request. If the time

required

for impact analysis increases, it implies that more and more components

are

affected and maintainability is decreasing.

276 Chapter 9 Software evolution

3. Average time taken to implement a change request This is not the same as

the time for impact analysis although it may correlate with it. This is the

amount of

time that you need to modify the system and its documentation, after you

have

assessed which components are affected. An increase in the time needed to

implement a change may indicate a decline in maintainability.

4. Number of outstanding change requests An increase in this number over

time may imply a decline in maintainability.

You use predicted information about change requests and predictions

about sys-

tem maintainability to predict maintenance costs. Most managers combine

this infor-

mation with intuition and experience to estimate costs. The COCOMO 2

model of

cost estimation, discussed in Chapter 23, suggests that an estimate for

software

maintenance effort can be based on the effort to understand existing code

and the

effort to develop the new code.

9.3.2 Software reengineering

Software maintenance involves understanding the program that has to be

changed

and then implementing any required changes. However, many systems,

especially

older legacy systems, are difficult to understand and change. The

programs may

have been optimized for performance or space utilization at the expense of

under-

standability, or, over time, the initial program structure may have been

corrupted by a series of changes.

To make legacy software systems easier to maintain, you can reengineer

these

systems to improve their structure and understandability. Reengineering

may

involve redocumenting the system, refactoring the system architecture,

translat-

ing programs to a modern programming language, or modifying and

updating the

structure and values of the system’s data. The functionality of the software

is not

changed, and, normally, you should try to avoid making major changes to

the

system architecture.

Reengineering has two important advantages over replacement:

1. Reduced risk There is a high risk in redeveloping business-critical

software.

Errors may be made in the system specification or there may be

development

problems. Delays in introducing the new software may mean that business

is

lost and extra costs are incurred.

2. Reduced cost The cost of reengineering may be significantly less than the

cost of developing new software. Ulrich (Ulrich 1990) quotes an example

of a

commercial system for which the reimplementation costs were estimated

at

$50 million. The system was successfully reengineered for $12 million. I

sus-

pect that, with modern software technology, the relative cost of

reimplemen-

tation is probably less than Ulrich’s figure but will still be more than the

costs

of reengineering.

9.3 Software maintenance 277

Program

Reengineered

Original

Original data

documentation

program

program

Reverse

engineering

Data

Source code

Program

reengineering

translation

modularization

Program

structure

improvement

Restructured

Reengineered

program

data

Figure 9.14 The

Figure 9.14 is a general model of the reengineering process. The input to

the pro-

reengineering

cess is a legacy program, and the output is an improved and restructured

version of

process

the same program. The activities in this reengineering process are:

1. Source code translation Using a translation tool, you can convert the

program from an old programming language to a more modern version of

the same language or to a different language.

2. Reverse engineering The program is analyzed and information extracted

from it. This helps to document its organization and functionality. Again,

this process

is usually completely automated.

3. Program structure improvement The control structure of the program is

analyzed and modified to make it easier to read and understand. This can

be par-

tially automated, but some manual intervention is usually required.

4. Program modularization Related parts of the program are grouped

together, and, where appropriate, redundancy is removed. In some cases,

this stage may

involve architectural refactoring (e.g., a system that uses several different

data

stores may be refactored to use a single repository). This is a manual

process.

5. Data reengineering The data processed by the program is changed to

reflect program changes. This may mean redefining database schemas and

converting

existing databases to the new structure. You should usually also clean up

the

data. This involves finding and correcting mistakes, removing duplicate

records,

and so on. This can be a very expensive and prolonged process.

Program reengineering may not necessarily require all of the steps in

Figure 9.11.

You don’t need source code translation if you still use the application’s

programming language. If you can do all reengineering automatically,

then recovering documentation through reverse engineering may be

unnecessary. Data reengineering is required only if the data structures in

the program change during system reengineering.

278 Chapter 9 Software evolution

Automated program

Program and data

restructuring

restructuring

Automated source

Automated restructuring

Restructuring plus

code conversion

with manual changes

architectural changes

Figure 9.15

Reengineering

approaches

Increased cost

To make the reengineered system interoperate with the new software, you

may

have to develop adaptor services, as discussed in Chapter 18. These hide

the original interfaces of the software system and present new, better-

structured interfaces that can be used by other components. This process

of legacy system wrapping is an

important technique for developing large-scale reusable services.

The costs of reengineering obviously depend on the extent of the work

that is

carried out. There is a spectrum of possible approaches to reengineering,

as shown

in Figure 9.15. Costs increase from left to right so that source code

translation is the cheapest option, and reengineering, as part of

architectural migration, is the

most expensive.

The problem with software reengineering is that there are practical limits

to how

much you can improve a system by reengineering. It isn’t possible, for

example, to

convert a system written using a functional approach to an object-oriented

system.

Major architectural changes or radical reorganizing of the system data

management

cannot be carried out automatically, so they are very expensive. Although

reengineering can improve maintainability, the reengineered system will

probably not be as

maintainable as a new system developed using modern software

engineering methods.

9.3.3 Refactoring

Refactoring is the process of making improvements to a program to slow

down deg-

radation through change. It means modifying a program to improve its

structure,

reduce its complexity, or make it easier to understand. Refactoring is

sometimes

considered to be limited to object-oriented development, but the principles

can in

fact be applied to any development approach. When you refactor a

program, you

should not add functionality but rather should concentrate on program

improvement.

You can therefore think of refactoring as “preventative maintenance” that

reduces

the problems of future change.

Refactoring is an inherent part of agile methods because these methods

are based

on change. Program quality is liable to degrade quickly, so agile

developers frequently refactor their programs to avoid this degradation.

The emphasis on regression testing in agile methods lowers the risk of

introducing new errors through refactoring. Any

errors that are introduced should be detectable, as previously successful

tests should then fail. However, refactoring is not dependent on other

“agile activities.”

9.3 Software maintenance 279

Although reengineering and refactoring are both intended to make

software easier

to understand and change, they are not the same thing. Reengineering

takes place after a system has been maintained for some time, and

maintenance costs are increasing.

You use automated tools to process and reengineer a legacy system to

create a new

system that is more maintainable. Refactoring is a continuous process of

improvement throughout the development and evolution process. It is

intended to avoid the structure and code degradation that increases the

costs and difficulties of maintaining a system.

Fowler et al. (Fowler et al. 1999) suggest that there are stereotypical

situations

(Fowler calls them “bad smells”) where the code of a program can be

improved.

Examples of bad smells that can be improved through refactoring include:

1. Duplicate code The same or very similar code may be included at

different places in a program. This can be removed and implemented as a

single method

or function that is called as required.

2. Long methods If a method is too long, it should be redesigned as a

number of shorter methods.

3. Switch (case) statements These often involve duplication, where the

switch depends on the type of a value. The switch statements may be

scattered around

a program. In object-oriented languages, you can often use polymorphism

to

achieve the same thing.

4. Data clumping Data clumps occur when the same group of data items

(fields in classes, parameters in methods) reoccurs in several places in a

program. These

can often be replaced with an object that encapsulates all of the data.

5. Speculative generality This occurs when developers include generality in

a program in case it is required in the future. This can often simply be

removed.

Fowler, in both his book and website, also suggests some primitive

refactoring

transformations that can be used singly or together to deal with bad

smells. Examples of these transformations include Extract method, where

you remove duplication and

create a new method; Consolidate conditional expression , where you

replace a sequence of tests with a single test; and Pull up method, where

you replace similar

methods in subclasses with a single method in a superclass. Interactive

development

environments, such as Eclipse, usually include refactoring support in their

editors.

This makes it easier to find dependent parts of a program that have to be

changed to implement the refactoring.

Refactoring, carried out during program development, is an effective way

to

reduce the long-term maintenance costs of a program. However, if you

take over a

program for maintenance whose structure has been significantly degraded,

then it

may be practically impossible to refactor the code alone. You may also

have to think about design refactoring, which is likely to be a more

expensive and difficult problem. Design refactoring involves identifying

relevant design patterns (discussed in

Chapter 7) and replacing existing code with code that implements these

design pat-

terns (Kerievsky 2004).

280 Chapter 9 Software evolution

K e y P o i n t s

Software development and evolution can be thought of as an integrated,

iterative process that can be represented using a spiral model.

For custom systems, the costs of software maintenance usually exceed

the software development costs.

The process of software evolution is driven by requests for changes and

includes change impact analysis, release planning, and change

implementation.

Legacy systems are older software systems, developed using obsolete

software and hardware technologies, that remain useful for a business.

It is often cheaper and less risky to maintain a legacy system than to

develop a replacement system using modern technology.

The business value of a legacy system and the quality of the application

software and its environment should be assessed to determine whether a

system should be replaced, transformed, or maintained.

There are three types of software maintenance, namely, bug fixing,

modifying software to work in a new environment, and implementing new

or changed requirements.

Software reengineering is concerned with restructuring and

redocumenting software to make it easier to understand and change.

Refactoring, making small program changes that preserve functionality,

can be thought of as preventative maintenance.

F u r t h E r r E a d i n g

Working Effectively with Legacy Code. Solid practical advice on the

problems and difficulties of dealing with legacy systems. (M. Feathers,

2004, John Wiley & Sons).

“The Economics of Software Maintenance in the 21st Century.” This

article is a general introduction to maintenance and a comprehensive

discussion of maintenance costs. Jones discusses the factors that affect

maintenance costs and suggests that almost 75% of the software workforce

are involved

in software maintenance activities. (C. Jones, 2006) http://

www.compaid.com/caiinternet/ezine/

capersjones-maintenance.pdf

“You Can’t Be Agile in Maintenance?” In spite of the title, this blog post

argues that agile techniques are appropriate for maintenance and discusses

which techniques as suggested in XP can be effective.

(J. Bird, 2011) http://swreflections.blogspot.co.uk/2011/10/you-cant-be-

agile-in-maintenance.html

“Software Reengineering and Testing Considerations.” This is an excellent

summary white paper of maintenance issues from a major Indian software

company. (Y. Kumar and Dipti, 2012)

http://www.infosys.com/engineering-services/white-papers/Documents/

software-re-engineering-

processes.pdf

Chapter 9 References 281

W E b S i t E

PowerPoint slides for this chapter:

www.pearsonglobaleditions.com/Sommerville

Links to supporting videos:

http://software-engineering-book.com/videos/implementation-and-

evolution/

E x E r C i S E S

9.1. Explain how advances in technology can force a software subsystem

to undergo change or run the risk of becoming useless.

9.2. From Figure 9.4, you can see that impact analysis is an important

subprocess in the software evolution process. Using a diagram, suggest

what activities might be involved in change impact analysis.

9.3. Explain why legacy systems should be thought of as sociotechnical

systems rather than simply software systems that were developed using

old technology.

9.4. Some software subsystems are seen as “low quality, high business

value.” Discuss how those subsystems can be re-engineered with minimal

impact on the operations of the organization.

9.5. What are the strategic options for legacy system evolution? When

would you normally replace all or part of a system rather than continue

maintenance of the software?

9.6. Explain why problems with support software might mean that an

organization has to replace its legacy systems.

9.7. As a software project manager in a company that specializes in the

development of software for the offshore oil industry, you have been given

the task of discovering the factors that affect the maintainability of the

systems developed by your company. Suggest how you might set up a

program to analyze the maintenance process and determine appropriate

maintainability metrics for the company.

9.8. Briefly describe the three main types of software maintenance. Why

is it sometimes difficult to distinguish between them?

9.9. Explain the differences between software reengineering and

refactoring?

9.10. Do software engineers have a professional responsibility to develop

code that can be easily maintained even if their employer does not

explicitly request it?

r E F E r E n C E S

Banker, R. D., S. M. Datar, C. F. Kemerer, and D. Zweig. 1993. “Software

Complexity and Maintenance Costs.” Comm. ACM 36 (11): 81–94.

doi:10.1145/163359.163375.

Coleman, D., D. Ash, B. Lowther, and P. Oman. 1994. “Using Metrics to

Evaluate Software System Maintainability.” IEEE Computer 27 (8): 44–49.

doi:10.1109/2.303623.

282 Chapter 9 Software evolution

Davidsen, M. G., and J. Krogstie. 2010. “A Longitudinal Study of

Development and Maintenance.”

Information and Software Technology 52 (7): 707–719. doi:10.1016/

j.infsof.2010.03.003.

Erlikh, L. 2000. “Leveraging Legacy System Dollars for E-Business.” IT

Professional 2 (3 (May/June 2000)): 17–23. doi:10.1109/6294.846201.

Fowler, M., K. Beck, J. Brant, W. Opdyke, and D. Roberts. 1999.

Refactoring: Improving the Design of Existing Code. Boston: Addison-Wesley.

Hopkins, R., and K. Jenkins. 2008. Eating the IT Elephant: Moving from

Greenfield Development to Brownfield. Boston: IBM Press.

Jones, T. C. 2006. “The Economics of Software Maintenance in the 21st

Century.” www.compaid

.com/caiinternet/ezine/capersjones-maintenance.pdf.

Kerievsky, J. 2004. Refactoring to Patterns. Boston: Addison-Wesley.

Kozlov, D., J. Koskinen, M. Sakkinen, and J. Markkula. 2008. “Assessing

Maintainability Change over Multiple Software Releases.” J. of Software

Maintenance and Evolution 20 (1): 31–58. doi:10.1002/

smr.361.

Lientz, B. P., and E. B. Swanson. 1980. Software Maintenance Management.

Reading, MA: Addison-Wesley.

Mitchell, R. M. 2012. “COBOL on the Mainframe: Does It Have a Future?”

Computerworld US. http://

features.techworld.com/applications/3344704/cobol-on-the-mainframe-

does-it-have-a-future/

O’Hanlon, C. 2006. “A Conversation with Werner Vogels.” ACM Queue 4

(4): 14–22.

doi:10.1145/1142055.1142065.

Rajlich, V. T., and K. H. Bennett. 2000. “A Staged Model for the Software

Life Cycle.” IEEE Computer 33 (7): 66–71. doi:10.1109/2.869374.

Ulrich, W. M. 1990. “The Evolutionary Growth of Software Reengineering

and the Decade Ahead.”

American Programmer 3 (10): 14–20.

Warren, I. (ed.). 1998. The Renaissance of Legacy Systems. London:

Springer.

PART 2Dependability

and Security

As software systems are now part of all aspects of our lives, I believe that

the most significant challenge that we face in software engineering is

ensuring that we can trust these systems. To trust a system, we must have

confidence that it will be available when required and perform as

expected.

It must be secure so that our computers or data are not threatened by it

and

it has to recover quickly in the event of failure or cyberattack. This part of

the book has therefore focuses on the important topics of software system

dependability and security.

Chapter 10 introduces the basic concepts of dependability and security

namely reliability, availability, safety, security and resilience. I explain

why building secure, dependable systems is not simply a technical

problem. I introduce redundancy and diversity as the fundamental

mechanisms used to create dependable and secure systems. The indi-

vidual dependability attributes are covered in more detail in the fol-

lowing chapters.

Chapter 11 focuses on reliability and availability and I explain how these

attributes can be specified as probabilities of failure or downtime. I dis-

cuss a number of architectural patterns for fault-tolerant system architec-

tures and development techniques that can be used to reduce the number

of faults in a system. In the final section, I explain how the reliability of a

system may be tested and measured.

More and more systems are safety-critical systems, where system failure

can endanger people. Chapter 12 is concerned with safety engineering

and techniques that may be used to develop these safety-critical systems.

I explain why safety is a broader notion than reliability and discuss meth-

ods for deriving system safety requirements. I also explain why defined

and documented processes for safety-critical systems engineering are

important and describe software safety cases—structured documents

that are used to justify why a system is safe.

Threats to the security of our systems are one of the major problems

faced by today’s societies and I devote two chapters to this topic.

Chapter 13 is concerned with application security engineering— methods

used to achieve security in individual software systems. I explain the

relationships between security and other dependability attributes and

cover security requirements engineering, secure systems design and

security testing.

Chapter 14 is a new chapter that addresses the broader issue of resil-

ience. A resilient system can continue to deliver its essential services

even when individual parts of the system fail or are subject to a cyberat-

tack. I explain the basics of cybersecurity and discuss how resilience is

achieved by using redundancy and diversity and by empowering people

as well as through technical mechanisms. Finally, I discuss systems and

software design issues that can contribute to improving the resilience of

a system.

10

Dependable systems

Objectives

The objective of this chapter is to introduce the topic of software

dependability and what is involved in developing dependable software

systems. When you have read this chapter, you will:

understand why dependability and security are important attributes

for all software systems;

understand the five important dimensions of dependability, namely,

availability, reliability, safety, security, and resilience;

understand the notion of sociotechnical systems and why we have to

consider these systems as a whole rather than just software systems;

know why redundancy and diversity are the fundamental concepts

used in achieving dependable systems and processes;

be aware of the potential for using formal methods in dependable

systems engineering.

Contents

10.1 Dependability properties

10.2 Sociotechnical systems

10.3 Redundancy and diversity

10.4 Dependable processes

10.5 Formal methods and dependability

286 Chapter 10 Dependable systems

As computer systems have become deeply embedded in our business and

personal

lives, the problems that result from system and software failure are

increasing. A

failure of server software in an e-commerce company could lead to a

major loss of

revenue and customers for that company. A software error in an

embedded control

system in a car could lead to expensive recalls of that model for repair

and, in the worst case, could be a contributory factor in accidents. The

infection of company

PCs with malware requires expensive clean-up operations to sort out the

problem

and could lead to the loss of or damage to sensitive information.

Because software-intensive systems are so important to governments,

companies,

and individuals, we have to be able to trust these systems. The software

should be

available when it is needed, and it should operate correctly without

undesirable side effects, such as unauthorized information disclosure. In

short, we should be able to depend on our software systems.

The term dependability was proposed by Jean-Claude Laprie in 1995 to

cover the related systems attributes of availability, reliability, safety, and

security. His ideas were revised over the next few years and are discussed

in a definitive paper published in 2004 (Avizienis et al. 2004). As I discuss

in Section 10.1, these properties are inextricably linked, so having a single

term to cover them all makes sense.

The dependability of systems is usually more important than their detailed

func-

tionality for the following reasons:

1. System failures affect a large number of people Many systems include

functionality that is rarely used. If this functionality were left out of the

system, only a small number of users would be affected. System failures

that affect the availability of a system potentially affect all users of the

system. Unavailable sys-

tems may mean that normal business is impossible.

2. Users often reject systems that are unreliable, unsafe, or insecure If users

find that a system is unreliable or insecure, they will refuse to use it.

Furthermore,

they may also refuse to buy or use other products from the company that

pro-

duced the unreliable system. They do not want a repetition of their bad

experi-

ence with an undependable system.

3. System failure costs may be enormous For some applications, such as a

reactor control system or an aircraft navigation system, the cost of system

failure is

orders of magnitude greater than the cost of the control system. Failures in

sys-

tems that control critical infrastructure such as the power network have

wide-

spread economic consequences.

4. Undependable systems may cause information loss Data is very expensive

to collect and maintain; it is usually worth much more than the computer

system on which it

is processed. The cost of recovering lost or corrupt data is usually very

high.

However, a system can be useful without it being very dependable. I don’t

think

that the word processor that I used to write this book is a very dependable

system.

It sometimes freezes and has to be restarted. Nevertheless, because it is

very useful,

Chapter 10 Dependable systems 287

Critical systems

Some classes of system are “critical systems” where system failure may

result in injury to people, damage to the environment, or extensive

economic losses. Examples of critical systems include embedded systems

in medical devices, such as an insulin pump (safety-critical), spacecraft

navigation systems (mission-critical), and online money transfer systems

(business critical).

Critical systems are very expensive to develop. Not only must they be

developed so that failures are very rare, but they must also include

recovery mechanisms to be used if and when failures occur.

http://software-engineering-book.com/web/critical-systems/

I am prepared to tolerate occasional failure. However, to reflect my lack of

trust in the system, I save my work frequently and keep multiple backup

copies of it. I compensate for the lack of system dependability by actions

that limit the damage that

could result from system failure.

Building dependable software is part of the more general process of

dependable

systems engineering. As I discuss in Section 10.2, software is always part

of a

broader system. It executes in an operational environment that includes

the hardware on which the software executes, the human users of that

software and the organizational or business processes where the software

is used. When designing a dependable system, you therefore have to

consider:

1. Hardware failure System hardware may fail because of mistakes in its

design, because components fail as a result of manufacturing errors,

because of environmental factors such as dampness or high temperatures,

or because compo-

nents have reached the end of their natural life.

2. Software failure System software may fail because of mistakes in its

specification, design, or implementation.

3. Operational failure Human users may fail to use or operate the system as

intended by its designers. As hardware and software have become more

reliable,

failures in operation are now, perhaps, the largest single cause of system

failures.

These failures are often interrelated. A failed hardware component may

mean

system operators have to cope with an unexpected situation and

additional workload.

This puts them under stress, and people under stress often make mistakes.

These

mistakes can cause the software to fail, which means more work for

operators, even

more stress, and so on.

As a result, it is particularly important that designers of dependable,

software-

intensive systems take a holistic sociotechnical systems perspective rather

than focus on a single aspect of the system such as its software or

hardware. If hardware, software, and operational processes are designed

separately, without taking into account the potential weaknesses of other

parts of the system, then it is more likely that errors will occur at the

interfaces between the different parts of the system.

288 Chapter 10 Dependable systems

10.1 Dependability properties

All of us are familiar with the problem of computer system failure. For no

obvious

reason, our computers sometimes crash or go wrong in some way.

Programs running

on these computers may not operate as expected and occasionally may

corrupt the

data that is managed by the system. We have learned to live with these

failures, but few of us completely trust the personal computers that we

normally use.

The dependability of a computer system is a property of the system that

reflects

its trustworthiness. Trustworthiness here essentially means the degree of

confidence a user has that the system will operate as they expect and that

the system will not

“fail” in normal use. It is not meaningful to express dependability

numerically.

Rather, relative terms such as “not dependable,” “very dependable,” and

“ultra-

dependable” can reflect the degree of trust that we might have in a

system.

There are five principal dimensions to dependability, as I have shown in

Figure 10.1.

1. Availability Informally, the availability of a system is the probability

that it will be up and running and able to deliver useful services to users

at any given time.

2. Reliability Informally, the reliability of a system is the probability, over

a given period of time, that the system will correctly deliver services as

expected by the user.

3. Safety Informally, the safety of a system is a judgment of how likely it is

that the system will cause damage to people or its environment.

4. Security Informally, the security of a system is a judgment of how likely

it is that the system can resist accidental or deliberate intrusions.

5. Resilience Informally, the resilience of a system is a judgment of how

well that system can maintain the continuity of its critical services in the

presence of

disruptive events, such as equipment failure and cyberattacks. Resilience

is a

more recent addition to the set of dependability properties that were

originally

suggested by Laprie.

The dependability properties shown in Figure 10.1 are complex properties

that

can be broken down into several simpler properties. For example, security

includes

“integrity” (ensuring that the systems program and data are not damaged)

and “con-

fidentiality” (ensuring that information can only be accessed by people

who are

authorized). Reliability includes “correctness” (ensuring the system

services are as specified), “precision” (ensuring information is delivered at

an appropriate level of detail), and “timeliness” (ensuring that information

is delivered when it is required).

Of course, not all dependability properties are critical for all systems. For

the

insulin pump system, introduced in Chapter 1, the most important

properties are reliability (it must deliver the correct dose of insulin) and

safety (it must never deliver a dangerous dose of insulin). Security is not

an issue as the pump does not store confidential information. It is not

networked and so cannot be maliciously attacked. For

10.1 Dependability properties 289

Dependability

Availability

Reliability

Safety

Security

Resilience

The ability of the system The ability of the system The ability of the

system The ability of the system The ability of the system to deliver

services when

to deliver services as

to operate without

to protect itself against

to resist and recover

requested

specified

catastrophic failure

deliberate or accidental from damaging events

intrusion

Figure 10.1 Principal the wilderness weather system, availability and

reliability are the most important dependability

properties because the costs of repair may be very high. For the Mentcare

patient

properties

information system, security and resilience are particularly important

because of the sensitive private data that is maintained and the need for

the system to be available for patient consultations.

Other system properties are closely related to these five dependability

properties

and influence a system’s dependability:

1. Repairability System failures are inevitable, but the disruption caused by

failure can be minimized if the system can be repaired quickly. It must be

possible to

diagnose the problem, access the component that has failed, and make

changes

to fix that component. Repairability in software is enhanced when the

organiza-

tion using the system has access to the source code and has the skills to

make

changes to it. Open-source software makes this easier, but the reuse of

compo-

nents can make it more difficult.

2. Maintainability As systems are used, new requirements emerge, and it is

important to maintain the value of a system by changing it to include

these new

requirements. Maintainable software is software that can be adapted

economi-

cally to cope with new requirements, and where there is a low probability

that

making changes will introduce new errors into the system.

3. Error tolerance This property can be considered as part of usability and

reflects the extent to which the system has been designed, so that user

input errors are

avoided and tolerated. When user errors occur, the system should, as far

as pos-

sible, detect these errors and either fix them automatically or request the

user to

re-input their data.

The notion of system dependability as an encompassing property was

introduced

because the dependability properties of availability, security, reliability,

safety, and resilience are closely related. Safe system operation usually

depends on the system

being available and operating reliably. A system may become unreliable

because an

intruder has corrupted its data. Denial-of-service attacks on a system are

intended to

290 Chapter 10 Dependable systems

compromise the system’s availability. If a system is infected with a virus,

you cannot then be confident in its reliability or safety because the virus

may change its behavior.

To develop dependable software, you therefore need to ensure that:

1. You avoid the introduction of accidental errors into the system during

software

specification and development.

2. You design verification and validation processes that are effective in

discover-

ing residual errors that affect the dependability of the system.

3. You design the system to be fault tolerant so that it can continue

working when

things go wrong.

4. You design protection mechanisms that guard against external attacks

that can

compromise the availability or security of the system.

5. You configure the deployed system and its supporting software correctly

for its

operating environment.

6. You include system capabilities to recognize external cyberattacks and

to resist these attacks.

7. You design systems so that they can quickly recover from system

failures and

cyberattacks without the loss of critical data.

The need for fault tolerance means that dependable systems have to

include

redundant code to help them monitor themselves, detect erroneous states,

and

recover from faults before failures occur. This affects the performance of

systems, as additional checking is required each time the system executes.

Therefore, designers

usually have to trade off performance and dependability. You may need to

leave

checks out of the system because these slow the system down. However,

the conse-

quential risk here is that the system fails because a fault has not been

detected.

Building dependable systems is expensive. Increasing the dependability of

a

system means that you incur extra costs for system design,

implementation, and val-

idation. Verification and validation costs are particularly high for systems

that must be ultra-dependable such as safety-critical control systems. As

well as validating that the system meets its requirements, the validation

process may have to prove to an

external regulator that the system is safe. For example, aircraft systems

have to demonstrate to regulators, such as the Federal Aviation Authority,

that the probability of a catastrophic system failure that affects aircraft

safety is extremely low.

Figure 10.2 shows the relationship between costs and incremental

improvements

in dependability. If your software is not very dependable, you can get

significant

improvements fairly cheaply by using better software engineering.

However, if you

are already using good practice, the costs of improvement are much

greater, and the

benefits from that improvement are less.

There is also the problem of testing software to demonstrate that it is

dependable.

Solving this problem relies on running many tests and looking at the

number of fail-

ures that occur. As your software becomes more dependable, you see

fewer and

10.2 Sociotechnical systems 291

Cost

Low

Medium

High

Very

Ultra-

high

high

Figure 10.2 Cost/

dependability curve

Dependability

fewer failures. Consequently, more and more tests are needed to try and

assess how

many problems remain in the software. Testing is a very expensive

process, so this

can significantly increase the cost of high-dependability systems.

10.2 Sociotechnical systems

In a computer system, the software and the hardware are interdependent.

Without

hardware, a software system is an abstraction, which is simply a

representation of

some human knowledge and ideas. Without software, hardware is a set of

inert elec-

tronic devices. However, if you put them together to form a system, you

create a

machine that can carry out complex computations and deliver the results

of these

computations to its environment.

This illustrates one of the fundamental characteristics of a system—it is

more than

the sum of its parts. Systems have properties that become apparent only

when their

components are integrated and operate together. Software systems are not

isolated

systems but are part of more extensive systems that have a human, social,

or organi-

zational purpose. Therefore software engineering is not an isolated activity

but is an intrinsic part of systems engineering (Chapter 19).

For example, the wilderness weather system software controls the

instruments in

a weather station. It communicates with other software systems and is a

part of wider national and international weather forecasting systems. As

well as hardware and

software, these systems include processes for forecasting the weather and

people

who operate the system and analyze its outputs. The system also includes

the organ-

izations that depend on the system to help them provide weather forecasts

to indi-

viduals, government and industry.

292 Chapter 10 Dependable systems

Society

Organization

Business processes

Application system

Systems

Software

engineering

Communications and data management

engineering

Operating system

Figure 10.3 The

sociotechnical

Equipment

systems stack

These broader systems are called sociotechnical systems. They include

nontechnical elements such as people, processes, and regulations, as well

as technical

components such as computers, software, and other equipment. System

dependability

is influenced by all of the elements in a sociotechnical system—hardware,

software,

people, and organizations.

Sociotechnical systems are so complex that it is impossible to understand

them as

a whole. Rather, you have to view them as layers, as shown in Figure

10.3. These

layers make up the sociotechnical systems stack:

1. The equipment layer is composed of hardware devices, some of which

may be computers.

2. The operating system layer interacts with the hardware and provides a set

of common facilities for higher software layers in the system.

3. The communications and data management layer extends the operating

system facilities and provides an interface that allows interaction with

more extensive

functionality, such as access to remote systems and access to a system

database.

This is sometimes called middleware, as it is in between the application

and the

operating system.

4. The application layer delivers the application-specific functionality that

is required. There may be many different application programs in this

layer.

5. The business process layer includes the organizational business processes,

which make use of the software system.

6. The organizational layer includes higher-level strategic processes as well

as business rules, policies, and norms that should be followed when using

the

system.

7. The social layer refers to the laws and regulations of society that govern

the operation of the system.

10.2 Sociotechnical systems 293

Notice that there is no separate “software layer.” Software of one kind or

another

is an important part of all of the layers in the sociotechnical system.

Equipment is controlled by embedded software; the operating system and

applications are software. Business processes, organizations, and society

rely on the Internet (software) and other global software systems.

In principle, most interactions should be between neighboring layers in

the

stack, with each layer hiding the detail of the layer below from the layer

above. In practice, however, there can be unexpected interactions between

layers, which

result in problems for the system as a whole. For example, say there is a

change in the law governing access to personal information. This comes

from the social layer.

It leads to new organizational procedures and changes to the business

processes.

The application system itself may not be able to provide the required level

of pri-

vacy, so changes may have to be implemented in the communications and

data

management layer.

Thinking holistically about systems, rather than simply considering

software in

isolation, is essential when considering software security and

dependability.

Software itself is intangible and, even when damaged, is easily and

cheaply restored.

However, when these software failures ripple through other parts of the

system, they affect the software’s physical and human environment. Here,

the consequences of

failure are more significant. Important data may be lost or corrupted.

People may

have to do extra work to contain or recover from the failure; for example,

equipment may be damaged, data may be lost or corrupted, or

confidentiality may be breached,

with unknown consequences.

You must, therefore, take a system-level view when you are designing

software

that has to be dependable and secure. You have to take into account the

consequences of software failures for other elements in the system. You

also need to understand

how these other system elements may be the cause of software failure and

how they

can help to protect against and recover from software failures.

It is important to ensure that, wherever possible, software failure does not

lead to overall system failure. You must therefore examine how the

software interacts with

its immediate environment to ensure that:

1. Software failures are, as far as possible, contained within the enclosing

layer of the system stack and do not seriously affect the operation of other

layers in the

system.

2. You understand how faults and failures in the other layers of the

systems stack

may affect the software. You may also consider how checks may be built

into

the software to help detect these failures, and how support can be

provided for

recovering from failure.

As software is inherently flexible, unexpected system problems are often

left to

software engineers to solve. Say a radar installation has been sited so that

ghosting of the radar image occurs. It is impractical to move the radar to a

site with less

interference, so the systems engineers have to find another way of

removing this

294 Chapter 10 Dependable systems

ghosting. Their solution may be to enhance the image-processing

capabilities of the

software to remove the ghost images. This may slow down the software so

that its

performance becomes unacceptable. The problem may then be

characterized as a

software failure, whereas, in fact, it is a failure in the design process for

the system as a whole.

This sort of situation, in which software engineers are left with the

problem of

enhancing software capabilities without increasing hardware cost, is very

common.

Many so-called software failures are not a consequence of inherent

software prob-

lems but rather are the result of trying to change the software to

accommodate mod-

ified system engineering requirements. A good example was the failure of

the

Denver airport baggage system (Swartz 1996), where the controlling

software was

expected to deal with limitations of the equipment used.

10.2.1 Regulation and compliance

The general model of economic organization that is now almost universal

in the

world is that privately owned companies offer goods and services and

make a profit

on these. We have a competitive environment so that these companies

may compete

on cost, on quality, on delivery time, and so on. However, to ensure the

safety of

their citizens, most governments limit the freedom of privately owned

companies so

that they must follow certain standards to ensure that their products are

safe and

secure. A company therefore cannot offer products for sale more cheaply

because

they have reduced their costs by reducing the safety of their products.

Governments have created a set of rules and regulations in different areas

that

define standards for safety and security. They have also established

regulators or

regulatory bodies whose job is to ensure that companies offering products

in an area comply with these rules. Regulators have wide powers. They

can fine companies and

even imprison directors if regulations are breached. They may have a

licensing role

(e.g., in the aviation and nuclear industries) where they must issue a

license before a new system may be used. Therefore, aircraft

manufacturers have to have a certificate of airworthiness from the

regulator in each country where the aircraft is used.

To achieve certification, companies that are developing safety-critical

systems

have to produce an extensive safety case (discussed in Chapter 13) that

shows that

rules and regulations have been followed. The case must convince a

regulator that

the system can operate safely. Developing such a safety case is very costly.

It can be as expensive to develop the documentation for certification as it

is to develop the

system itself.

Regulation and compliance (following the rules) applies to the

sociotechnical

system as a whole and not simply the software element of that system. For

example,

a regulator in the nuclear industry is concerned that in the event of

overheating, a nuclear reactor will not release radioactivity into the

environment. Arguments to

convince the regulator that this is the case may be based on software

protection systems, the operational process used to monitor the reactor

core and the integrity of

structures that contain any release of radioactivity.

10.3 Redundancy and diversity 295

Each of these elements has to have its own safety case. So, the protection

system

must have a safety case that demonstrates that the software will operate

correctly and shut down the reactor as intended. The overall case must

also show that if the software protection system fails, there are alternative

safety mechanisms, which do not

rely on software, that are invoked.

10.3 Redundancy and diversity

Component failures in any system are inevitable. People make mistakes,

undiscov-

ered bugs in software cause undesirable behavior, and hardware burns

out. We use a

range of strategies to reduce the number of human failures such as

replacing hard-

ware components before the end of their predicted lifetime and checking

software

using static analysis tools. However, we cannot be sure that these will

eliminate

component failures. We should therefore design systems so that individual

compo-

nent failures do not lead to overall system failure.

Strategies to achieve and enhance dependability rely on both redundancy

and

diversity. Redundancy means that spare capacity is included in a system

that can be

used if part of that system fails. Diversity means that redundant

components of the

system are of different types, thus increasing the chances that they will

not fail in exactly the same way.

We use redundancy and diversity to enhance dependability in our

everyday

lives. Commonly, to secure our homes we use more than one lock

(redundancy),

and, usually, the locks used are of different types (diversity). This means

that if

intruders find a way to defeat one of the locks, they have to find a

different way of defeating the other locks before they can gain entry. As a

matter of routine, we

should all back up our computers and so maintain redundant copies of our

data. To

avoid problems with disk failure, backups should be kept on a separate,

diverse,

external device.

Software systems that are designed for dependability may include

redundant

components that provide the same functionality as other system

components. These

are switched into the system if the primary component fails. If these

redundant com-

ponents are diverse, that is, not the same as other components, a common

fault in

replicated components will not result in a system failure. Another form of

redun-

dancy is the inclusion of checking code, which is not strictly necessary for

the system to function. This code can detect some kinds of problems, such

as data corruption, before they cause failures. It can invoke recovery

mechanisms to correct problems to ensure that the system continues to

operate.

In systems for which availability is a critical requirement, redundant

servers are

normally used. These automatically come into operation if a designated

server fails.

Sometimes, to ensure that attacks on the system cannot exploit a common

vulnera-

bility, these servers may be of different types and may run different

operating sys-

tems. Using different operating systems is an example of software diversity

and

296 Chapter 10 Dependable systems

The Ariane 5 explosion

In 1996, the European Space Agency’s Ariane 5 rocket exploded 37

seconds after lift-off on its maiden flight.

The fault was caused by a software systems failure. There was a backup

system but it was not diverse, and so the software in the backup computer

failed in exactly the same way. The rocket and its satellite payload were

destroyed.

http://software-engineering-book.com/web/ariane/

redundancy, where similar functionality is provided in different ways. (I

discuss

software diversity in more detail in Chapter 12.)

Diversity and redundancy may also be also used in the design of

dependable soft-

ware development processes. Dependable development processes avoid

the intro-

duction of faults into a system. In a dependable process, activities such as

software validation do not rely on a single tool or technique. This

improves software dependability because it reduces the chances of process

failure, where human errors made

during the software development process lead to software errors.

For example, validation activities may include program testing, manual

program

inspections, and static analysis as fault-finding techniques. Any one of

these techniques might find faults that are missed by the other methods.

Furthermore, different team

members may be responsible for the same process activity (e.g., a program

inspection).

People tackle tasks in different ways depending on their personality,

experience, and education, so this kind of redundancy provides a diverse

perspective on the system.

However, as I discuss in Chapter 11, using software redundancy and

diversity

can itself introduce bugs into software. Diversity and redundancy make

systems

more complex and usually harder to understand. Not only is there more

code to

write and check, but additional functionality must also be added to the

system to

detect component failure and to switch control to alternative components.

This addi-

tional complexity means that it is more likely that programmers will make

errors

and less likely that people checking the system will find these errors.

Some engineers therefore think that, as software cannot wear out, it is best

to

avoid software redundancy and diversity. Their view is that the best

approach is to

design the software to be as simple as possible, with extremely rigorous

software

verification and validation procedures (Parnas, van Schouwen, and Shu

1990). More

can be spent on verification and validation because of the savings that

result from

not having to develop redundant software components.

Both approaches are used in commercial, safety-critical software systems.

For

example, the Airbus 340 flight control hardware and software is both

diverse and

redundant. The flight control software on the Boeing 777 runs on

redundant hard-

ware, but each computer runs the same software, which has been very

extensively

validated. The Boeing 777 flight control system designers have focused on

simplic-

ity rather than redundancy. Both of these aircraft are very reliable, so both

the diverse and the simple approach to dependability can clearly be

successful.

10.4 Dependable processes 297

Dependable operational processes

This chapter discusses dependable development processes, but system

operational processes are equally important contributors for system

dependability. In designing these operational processes, you have to take

into account human factors and always bear in mind that people are liable

to make mistakes when using a system.

A dependable process should be designed to avoid human errors, and,

when mistakes are made, the software should detect the mistakes and

allow them to be corrected.

http://software-engineering-book.com/web/human-error/

10.4 Dependable processes

Dependable software processes are software processes that are designed to

pro-

duce dependable software. The rationale for investing in dependable

processes is

that a good software process is likely to lead to delivered software that

contains

fewer errors and is therefore less likely to fail in execution. A company

using a

dependable process can be sure that the process has been properly enacted

and

documented and that appropriate development techniques have been used

for crit-

ical systems development. Figure 10.4 shows some of the attributes of

dependable

software processes.

The evidence that a dependable process has been used is often important

in con-

vincing a regulator that the most effective software engineering practice

has been

applied in developing the software. System developers will normally

present a model

of the process to a regulator, along with evidence that the process has

been followed.

The regulator also has to be convinced that the process is used consistently

by all of the process participants and that it can be used in different

development projects.

This means that the process must be explicitly defined and repeatable:

1. An explicitly defined process is one that has a defined process model

that is

used to drive the software production process. Data must be collected

during the

process that proves that the development team has followed the process as

defined in the process model.

2. A repeatable process is one that does not rely on individual

interpretation and

judgment. Rather, the process can be repeated across projects and with

different

team members, irrespective of who is involved in the development. This is

par-

ticularly important for critical systems, which often have a long

development

cycle during which there are often significant changes in the development

team.

Dependable processes make use of redundancy and diversity to achieve

reliabil-

ity. They often include different activities that have the same aim. For

example,

program inspections and testing aim to discover errors in a program. The

approaches

can be used together so that they are likely to find more errors than would

be found using one technique on its own.

298 Chapter 10 Dependable systems

Process characteristic

Description

Auditable

The process should be understandable by people apart from process

participants, who can check that process standards are being followed and

make suggestions for process improvement.

Diverse

The process should include redundant and diverse verification and

validation activities.

Documentable

The process should have a defined process model that sets out the

activities in

the process and the documentation that is to be produced during these

activities.

Robust

The process should be able to recover from failures of individual process

activities.

Standardized

A comprehensive set of software development standards covering software

production and documentation should be available.

Figure 10.4 Attributes

The activities that are used in dependable processes obviously depend on

the type

of dependable

processes

of software that is being developed. In general, however, these activities

should be geared toward avoiding the introduction of errors into a system,

detecting and

removing errors, and maintaining information about the process itself.

Examples of activities that might be included in a dependable process

include:

1. Requirements reviews to check that the requirements are, as far as

possible,

complete and consistent.

2. Requirements management to ensure that changes to the requirements

are con-

trolled and that the impact of proposed requirements changes is

understood by

all developers affected by the change.

3. Formal specification, where a mathematical model of the software is

created

and analyzed. (I discussed the benefits of formal specification in Section

10.5.)

Perhaps its most important benefit is that it forces a very detailed analysis

of the system requirements. This analysis itself is likely to discover

requirements

problems that may have been missed in requirements reviews.

4. System modeling, where the software design is explicitly documented as

a set of

graphical models and the links between the requirements and these

models are

explicitly documented. If a model-driven engineering approach is used

(see

Chapter 5), code may be generated automatically from these models.

5. Design and program inspections, where the different descriptions of the

system

are inspected and checked by different people. A checklist of common

design

and programming errors may be used to focus the inspection process.

6. Static analysis, where automated checks are carried out on the source

code of

the program. These look for anomalies that could indicate programming

errors

or omissions. (I cover static analysis in Chapter 12.)

7. Test planning and management, where a comprehensive set of system

tests is

designed. The testing process has to be carefully managed to demonstrate

that

these tests provide coverage of the system requirements and have been

correctly

applied in the testing process.

10.5 Formal methods and dependability 299

As well as process activities that focus on system development and testing,

there

must also be well-defined quality management and change management

processes.

While the specific activities in a dependable process may vary from one

company to

another, the need for effective quality and change management is

universal.

Quality management processes (covered in Chapter 24) establish a set of

process and

product standards. They also include activities that capture process

information to demonstrate that these standards have been followed. For

example, there may be a standard defined for carrying out program

inspections. The inspection team leader is responsible for documenting the

process to show that the inspection standard has been followed.

Change management, discussed in Chapter 25, is concerned with

managing changes

to a system, ensuring that accepted changes are actually implemented, and

confirming that planned releases of the software include the planned

changes. One common problem with software is that the wrong

components are included in a system build. This can lead to a situation

where an executing system includes components that have not been

checked during the development process. Configuration management

procedures must be defined

as part of the change management process to ensure that this does not

happen.

As agile methods have become increasingly used, researchers and

practitioners have

thought carefully about how to use agile approaches in dependable

software development (Trimble 2012). Most companies that develop

critical software systems have based their development on plan-based

processes and have been reluctant to make radical changes to their

development process. However, they recognize the value of agile

approaches and are exploring how their dependable development

processes can be more agile.

As dependable software often requires certification, both process and

product

documentation have to be produced. Up-front requirements analysis is

also essential

to discover possible requirements and requirements conflicts that may

compromise

the safety and security of the system. Formal change analysis is essential

to assess the effect of changes on the safety and integrity of the system.

These requirements

conflict with the general approach in agile development of co-

development of the

requirements and the system and minimizing documentation.

Although most agile development uses an informal, undocumented

process, this

is not a fundamental requirement of agility. An agile process may be

defined that

incorporates techniques such as iterative development, test-first

development and

user involvement in the development team. As long as the team follows

that process

and documents their actions, agile techniques can be used. To support this

notion,

various proposals of modified agile methods have been made that

incorporate the

requirements of dependable systems engineering (Douglass 2013). These

combine

the most appropriate techniques from agile and plan-based development.

10.5 Formal methods and dependability

For more than 30 years, researchers have advocated the use of formal

methods of

software development. Formal methods are mathematical approaches to

software

development where you define a formal model of the software. You may

then for-

mally analyze this model to search for errors and inconsistencies, prove

that a program

300 Chapter 10 Dependable systems

is consistent with this model, or you may apply a series of correctness-

preserving

transformations to the model to generate a program. Abrial (Abrial 2009)

claims that the use of formal methods can lead to “faultless systems,”

although he is careful to limit what he means in this claim.

In an excellent survey, Woodcock et al. (Woodcock et al. 2009) discuss

industrial

applications where formal methods have been successfully applied. These

include

train control systems (Badeau and Amelot 2005), cash card systems (Hall

and

Chapman 2002), and flight control systems (Miller et al. 2005). Formal

methods are

the basis of tools used in static verification, such as the driver verification

system used by Microsoft (Ball et al. 2006).

Using a mathematically formal approach to software development was

proposed

at an early stage in the development of computer science. The idea was

that a formal specification and a program could be developed

independently. A mathematical

proof could then be developed to show that the program and its

specification were

consistent. Initially, proofs were developed manually but this was a long

and expen-

sive process. It quickly became clear that manual proofs could only be

developed for very small systems. Program proving is now supported by

large-scale automated

theorem proving software, which has meant that larger systems can be

proved.

However, developing the proof obligations for theorem provers is a

difficult and

specialized task, so formal verification is not widely used.

An alternative approach, which avoids a separate proof activity, is

refinement-

based development. Here, a formal specification of a system is refined

through

a series of correctness-preserving transformations to generate the software.

Because these are trusted transformations, you can be confident that the

gener-

ated program is consistent with its formal specification. This was the

approach

used in the software development for the Paris Metro system (Badeau and

Amelot

2005). It used a language called B (Abrial 2010), which was designed to

support

specification refinement.

Formal methods based on model-checking (Jhala and Majumdar 2009)

have been

used in a number of systems (Bochot et al. 2009; Calinescu and

Kwiatkowska 2009).

These systems rely on constructing or generating a formal state model of a

system

and using a model-checker to check that properties of the model, such as

safety

properties, always hold. The model-checking program exhaustively

analyzes the

specification and either reports that the system property is satisfied by the

model or presents an example that shows it is not satisfied. If a model can

be automatically or systematically generated from a program, this means

that bugs in the program can be

uncovered. (I cover model checking in safety-critical systems in Chapter

12.)

Formal methods for software engineering are effective for discovering or

avoid-

ing two classes of error in software representations:

1. Specification and design errors and omissions. The process of developing

and analyzing a formal model of the software may reveal errors and

omissions in the

software requirements. If the model is generated automatically or

systematically

from source code, analysis using model checking can discover undesirable

states that may occur, such as deadlock in a concurrent system.

10.5 Formal methods and dependability 301

Formal specification techniques

Formal system specifications may be expressed using two fundamental

approaches, either as models of the system interfaces (algebraic

specifications) or as models of the system state. An extra web chapter on

this topic shows examples of both of these approaches. The chapter

includes a formal specification of part of the insulin pump system.

http://software-engineering-book.com/web/formal-methods/ (web

chapter)

2. Inconsistencies between a specification and a program. If a refinement

method is used, mistakes made by developers that make the software

inconsistent with

the specification are avoided. Program proving discovers inconsistencies

between a program and its specification.

The starting point for all formal methods is a mathematical system model,

which

acts as a system specification. To create this model, you translate the

system’s user requirements, which are expressed in natural language,

diagrams, and tables, into a

mathematical language that has formally defined semantics. The formal

specifica-

tion is an unambiguous description of what the system should do.

Formal specifications are the most precise way of specifying systems, and

so

reduce the scope for misunderstanding. Many supporters of formal

methods believe

that creating formal specification, even without refinement or program

proof, is

worthwhile. Constructing a formal specification forces a detailed analysis

of the

requirements and this is an effective way of discovering requirements

problems. In a natural language specification, errors can be concealed by

the imprecision of the

language. This is not the case if the system is formally specified.

The advantages of developing a formal specification and using it in a

formal

development process are:

1. As you develop a formal specification in detail, you develop a deep and

detailed understanding of the system requirements. Requirements

problems that are discovered early are usually much cheaper to correct

than if they are found at later

stages in the development process.

2. As the specification is expressed in a language with formally defined

semantics, you can analyze it automatically to discover inconsistencies

and incompleteness.

3. If you use a method such as the B method, you can transform the formal

speci-

fication into a program through a sequence of correctness-preserving

transfor-

mations. The resulting program is therefore guaranteed to meet its

specification.

4. Program testing costs may be reduced because you have verified the

program

against its specification. For example, in the development of the software

for the

Paris Metro systems, the use of refinement meant that there was no need

for

software component testing and only system testing was required.

302 Chapter 10 Dependable systems

Woodcock’s survey (Woodcock et al. 2009) found that users of formal

methods

reported fewer errors in the delivered software. Neither the costs nor the

time needed for software development were higher than in comparable

development projects.

There were significant benefits in using formal approaches in safety

critical systems that required regulator certification. The documentation

produced was an important

part of the safety case (see Chapter 12) for the system.

In spite of these advantages, formal methods have had limited impact on

practical

software development, even for critical systems. Woodcock reports on 62

projects

over 25 years that used formal methods. Even if we allow for projects that

used these techniques but did not report their use, this is a tiny fraction of

the total number of critical systems developed in that time. Industry has

been reluctant to adopt formal methods for a number of reasons:

1. Problem owners and domain experts cannot understand a formal

specification,

so they cannot check that it accurately represents their requirements.

Software

engineers, who understand the formal specification, may not understand

the

application domain, so they too cannot be sure that the formal

specification is an

accurate reflection of the system requirements.

2. It is fairly easy to quantify the costs of creating a formal specification,

but more difficult to estimate the possible cost savings that will result

from its use. As a

result, managers are unwilling to take the risk of adopting formal

methods. They

are unconvinced by reports of success as, by and large, these came from

atypical

projects where the developers were keen advocates of a formal approach.

3. Most software engineers have not been trained to use formal

specification lan-

guages. Hence, they are reluctant to propose their use in development

processes.

4. It is difficult to scale current formal methods up to very large systems.

When

formal methods are used, it is mostly for specifying critical kernel software

rather than complete systems.

5. Tool support for formal methods is limited, and the available tools are

often

open source and difficult to use. The market is too small for commercial

tool

providers.

6. Formal methods are not compatible with agile development where

programs are

developed incrementally. This is not a major issue, however, as most

critical

systems are still developed using a plan-based approach.

Parnas, an early advocate of formal development, has criticized current

formal

methods and claims that these have started from a fundamentally wrong

premise

(Parnas 2010). He believes that these methods will not gain acceptance

until they

are radically simplified, which will require a different type of mathematics

as a

basis. My own view is that even this will not mean that formal methods

are rou-

tinely adopted for critical systems engineering unless it can be clearly

demon-

strated that their adoption and use is cost-effective, compared to other

software

engineering methods.

Chapter 10 Website 303

K e y P o i n t s

System dependability is important because failure of critical computer

systems can lead to large economic losses, serious information loss,

physical damage or threats to human life.

The dependability of a computer system is a system property that

reflects the user’s degree of trust in the system. The most important

dimensions of dependability are availability, reliability, safety, security,

and resilience.

Sociotechnical systems include computer hardware, software, and

people, and are situated within an organization. They are designed to

support organizational or business goals and objectives.

The use of a dependable, repeatable process is essential if faults in a

system are to be minimized. The process should include verification and

validation activities at all stages, from requirements definition through to

system implementation.

The use of redundancy and diversity in hardware, software processes,

and software systems is essential to the development of dependable

systems.

Formal methods, where a formal model of a system is used as a basis for

development, help reduce the number of specification and implementation

errors in a system. However, formal methods have had a limited take-up

in industry because of concerns about the cost-effectiveness of this

approach.

F u R t h e R R e a D i n g

“Basic Concepts and Taxonomy of Dependable and Secure Computing.”

This work presents a thorough discussion of dependability concepts

written by some of the pioneers in the field who were responsible for

developing these ideas. (A. Avizienis, J.-C. Laprie, B. Randell and C.

Landwehr., IEEE Transactions on Dependable and Secure Computing, 1 (1),

2004) http://dx.doi.org/10.1109/TDSC.2004.2

Formal Methods: Practice and Experience. An excellent survey of the use

of formal methods in industry, along with a description of some projects

that have used formal methods. The authors present a realistic summary of

the barriers to the use of these methods. (J. Woodcock, P. G. Larsen, J.

Bicarregui, and J. Fitzgerald. Computing Surveys, 41 (1) January 2009)

http://dx.doi.org/10.1145/

1592434.1592436

The LSCITS Socio-technical Systems Handbook. This handbook introduces

sociotechnical systems in an accessible way and provides access to more

detailed papers on sociotechnical topics. (2012)

http://archive.cs.st-andrews.ac.uk/STSE-Handbook/

W e b S i t e

PowerPoint slides for this chapter:

www.pearsonglobaleditions.com/Sommerville

Links to supporting videos:

http://software-engineering-book.com/videos/critical-systems/

304 Chapter 10 Dependable systems

e x e R C i S e S

10.1. Suggest six reasons why software dependability is important in most

sociotechnical systems.

10.2. Explain with an example why resilience to cyber attacks is a very

important characteristic of system dependability.

10.3. Using an example, explain why it is important when developing

dependable systems to consider these as sociotechnical systems and not

simply as technical software and hardware systems.

10.4. Give two examples of government functions that are supported by

complex sociotechnical systems and explain why, in the foreseeable

future, these functions cannot be completely automated.

10.5. Explain the difference between redundancy and diversity.

10.6. Explain why it is reasonable to assume that the use of dependable

processes will lead to the creation of dependable software.

10.7. Give two examples of diverse, redundant activities that might be

incorporated into dependable processes.

10.8. Give two reasons why different versions of a system based on

software diversity may fail in a similar way.

10.9. You are an engineer in charge of the development of a small, safety-

critical train control system, which must be demonstrably safe and secure.

You suggest that formal methods

should be used in the development of this system, but your manager is

skeptical of this approach. Write a report highlighting the benefits of

formal methods and presenting a case for their use in this project.

10.10. It has been suggested that the need for regulation inhibits

innovation and that regulators force the use of older methods of systems

development that have been used on other

systems. Discuss whether or not you think this is true and the desirability

of regulators imposing their views on what methods should be used.

R e F e R e n C e S

Abrial, J. R. 2009. “Faultless Systems: Yes We Can.” IEEE Computer 42 (9):

30–36. doi:10.1109/

MC.2009.283.

. 2010. Modeling in Event-B: System and Software Engineering. Cambridge,

UK: Cambridge University Press.

Avizienis, A., J. C. Laprie, B. Randell, and C. Landwehr. 2004. “Basic

Concepts and Taxonomy of Dependable and Secure Computing.” IEEE

Trans. on Dependable and Secure Computing 1 (1): 11–33.

doi:10.1109/TDSC.2004.2.

Badeau, F., and A. Amelot. 2005. “Using B as a High Level Programming

Language in an Industrial Project: Roissy VAL.” In Proc. ZB 2005: Formal

Specification and Development in Z and B. Guildford, UK: Springer.

doi:10.1007/11415787_20.

Chapter 10 References 305

Ball, T., E. Bounimova, B. Cook, V. Levin, J. Lichtenberg, C. McGarvey, B.

Ondrusek, S. K. Rajamani, and A. Ustuner. 2006. “Thorough Static

Analysis of Device Drivers.” In Proc. EuroSys 2006. Leuven, Belgium.

doi:10.1145/1218063.1217943.

Bochot, T., P. Virelizier, H. Waeselynck, and V. Wiels. 2009. “Model

Checking Flight Control Systems: The Airbus Experience.” In Proc. 31st

International Conf. on Software Engineering, Companion Volume, 18–27.

Leipzig: IEEE Computer Society Press. doi:10.1109/ICSE-

COMPANION.2009.5070960.

Calinescu, R. C., and M. Z. Kwiatkowska. 2009. “Using Quantitative

Analysis to Implement Auto-nomic IT Systems.” In Proc. 31st International

Conf. on Software Engineering, Companion Volume, 100–10. Leipzig: IEEE

Computer Society Press. doi:10.1109/ICSE.2009.5070512.

Douglass, B. 2013. “Agile Analysis Practices for Safety-Critical Software

Development.” http://www

.ibm.com/developerworks/rational/library/agile-analysis-practices-safety-

critical-development/.

Hall, A., and R. Chapman. 2002. “Correctness by Construction: Developing

a Commercially Secure System.” IEEE Software 19 (1): 18–

25.doi:10.1109/52.976937.

Jhala, R., and R. Majumdar. 2009. “Software Model Checking.” Computing

Surveys 41 (4), Article 21.

doi:1145/1592434.1592438.

Miller, S. P., E. A. Anderson, L. G. Wagner, M. W. Whalen, and M. P. E.

Heimdahl. 2005. “Formal Verification of Flight Critical Software.” In Proc.

AIAA Guidance, Navigation and Control Conference. San Francisco.

doi:10.2514/6.2005-6431.

Parnas, D. 2010. “Really Rethinking Formal Methods.” IEEE Computer 43

(1): 28–34. doi:10.1109/

MC.2010.22.

Parnas, D., J. van Schouwen, and P. K. Shu. 1990. “Evaluation of Safety-

Critical Software.” Comm.

ACM 33 (6): 636–651. doi:10.1145/78973.78974.

Swartz, A. J. 1996. “Airport 95: Automated Baggage System?” ACM

Software Engineering Notes 21

(2): 79–83. doi:10.1145/227531.227544.

Trimble, J. 2012. “Agile Development Methods for Space Operations.” In

SpaceOps 2012. Stockholm.

doi:10.2514/6.2012-1264554.

Woodcock, J., P. G. Larsen, J. Bicarregui, and J. Fitzgerald. 2009. “Formal

Methods: Practice and Experience.” Computing Surveys 41 (4): 1–36.

doi:10.1145/1592434.1592436.

11

Reliability engineering

Objectives

The objective of this chapter is to explain how software reliability may

be specified, implemented, and measured. When you have read this

chapter, you will:

understand the distinction between software reliability and

software availability;

have been introduced to metrics for reliability specification and

how these are used to specify measurable reliability requirements;

understand how different architectural styles may be used to

implement reliable, fault-tolerant systems architectures;

know about good programming practice for reliable software

engineering;

understand how the reliability of a software system may be

measured using statistical testing.

Contents

11.1 Availability and reliability

11.2 Reliability requirements

11.3 Fault-tolerant architectures

11.4 Programming for reliability

11.5 Reliability measurement

Chapter 11 Reliability engineering 307

Our dependence on software systems for almost all aspects of our business

and

personal lives means that we expect that software to be available when we

need it.

This may be early in the morning or late at night, at weekends or during

holidays—

the software must run all day, every day of the year. We expect that

software will

operate without crashes and failures and will preserve our data and

personal infor-

mation. We need to be able to trust the software that we use, which means

that the

software must be reliable.

The use of software engineering techniques, better programming

languages, and

effective quality management has led to significant improvements in

software relia-

bility over the past 20 years. Nevertheless, system failures still occur that

affect the system’s availability or lead to incorrect results being produced.

In situations where software has a particularly critical role—perhaps in an

aircraft or as part of the national critical infrastructure—special reliability

engineering techniques may be used to

achieve the high levels of reliability and availability that are required.

Unfortunately, it is easy to get confused when talking about system

reliability, with different people meaning different things when they talk

about system faults and failures.

Brian Randell, a pioneer researcher in software reliability, defined a fault–

error–failure model (Randell 2000) based on the notion that human errors

cause faults; faults lead to errors, and errors lead to system failures. He

defined these terms precisely:

1. Human error or mistake Human behavior that results in the introduction

of faults into a system. For example, in the wilderness weather system, a

programmer might decide that the way to compute the time for the next

transmission is

to add 1 hour to the current time. This works except when the

transmission time

is between 23.00 and midnight (midnight is 00.00 in the 24-hour clock).

2. System fault A characteristic of a software system that can lead to a

system error. The fault in the above example is the inclusion of code to

add 1 to a variable called Transmission_time, without a check to see if the

value of Transmission_

time is greater than or equal to 23.00.

3. System error An erroneous system state during execution that can lead to

system behavior that is unexpected by system users. In this example, the

value of

the variable Transmission_time is set incorrectly to 24.XX rather than

00.XX

when the faulty code is executed.

4. System failure An event that occurs at some point in time when the

system does not deliver a service as expected by its users. In this case, no

weather data is

transmitted because the time is invalid.

System faults do not necessarily result in system errors, and system errors

do not

necessarily result in system failures:

1. Not all code in a program is executed. The code that includes a fault

(e.g., the failure to initialize a variable) may never be executed because of

the way that

the software is used.

308 Chapter 11 Reliability engineering

2. Errors are transient. A state variable may have an incorrect value

caused by the execution of faulty code. However, before this is accessed

and causes a system

failure, some other system input may be processed that resets the state to

a valid

value. The wrong value has no practical effect.

3. The system may include fault detection and protection mechanisms.

These

ensure that the erroneous behavior is discovered and corrected before the

sys-

tem services are affected.

Another reason why the faults in a system may not lead to system failures

is that

users adapt their behavior to avoid using inputs that they know cause

program failures.

Experienced users “work around” software features that they have found

to be unreliable. For example, I avoid some features, such as automatic

numbering, in the word

processing system that I use because my experience is that it often goes

wrong. Repairing faults in such unused features makes no practical

difference to the system reliability.

The distinction between faults, errors, and failures leads to three

complementary

approaches that are used to improve the reliability of a system:

1. Fault avoidance The software design and implementation process should

use approaches to software development that help avoid design and

programming

errors and so minimize the number of faults introduced into the system.

Fewer

faults means less chance of runtime failures. Fault-avoidance techniques

include

the use of strongly typed programming language to allow extensive

compiler

checking and minimizing the use of error-prone programming language

con-

structs, such as pointers.

2. Fault detection and correction Verification and validation processes are

designed to discover and remove faults in a program, before it is deployed

for operational

use. Critical systems require extensive verification and validation to

discover as

many faults as possible before deployment and to convince the system

stake-

holders and regulators that the system is dependable. Systematic testing

and

debugging and static analysis are examples of fault-detection techniques.

3. Fault tolerance The system is designed so that faults or unexpected

system behavior during execution are detected at runtime and are

managed in such a

way that system failure does not occur. Simple approaches to fault

tolerance

based on built-in runtime checking may be included in all systems. More

spe-

cialized fault-tolerance techniques, such as the use of fault-tolerant system

architectures, discussed in Section 11.3, may be used when a very high

level of

system availability and reliability is required.

Unfortunately, applying fault-avoidance, fault-detection, and fault-

tolerance

techniques is not always cost-effective. The cost of finding and removing

the remaining faults in a software system rises exponentially as program

faults are discovered and removed (Figure 11.1). As the software becomes

more reliable, you need to

spend more and more time and effort to find fewer and fewer faults. At

some stage,

even for critical systems, the costs of this additional effort become

unjustifiable.

11.1 Availability and reliability 309

or detected

Cost per err

Figure 11.1 The

Many

Few

Very few

increasing costs of

residual fault removal

Number of residual errors

As a result, software companies accept that their software will always

contain

some residual faults. The level of faults depends on the type of system.

Software

products have a relatively high level of faults, whereas critical systems

usually have a much lower fault density.

The rationale for accepting faults is that, if and when the system fails, it is

cheaper to pay for the consequences of failure than it would be to discover

and remove the

faults before system delivery. However, the decision to release faulty

software is not simply an economic one. The social and political

acceptability of system failure

must also be taken into account.

11.1 Availability and reliability

In Chapter 10, I introduced the concepts of system reliability and system

availability.

If we think of systems as delivering some kind of service (to deliver cash,

control

brakes, or connect phone calls, for example), then the availability of that

service is whether or not that service is up and running and its reliability

is whether or not that service delivers correct results. Availability and

reliability can both be expressed as probabilities. If the availability is

0.999, this means that, over some time period, the system is available for

99.9% of that time. If, on average, 2 inputs in every 1000 result in

failures, then the reliability, expressed as a rate of occurrence of failure, is

0.002.

More precise definitions of availability and reliability are:

1. Reliability The probability of failure-free operation over a specified time,

in a given environment, for a specific purpose.

2. Availability The probability that a system, at a point in time, will be

operational and able to deliver the requested services.

310 Chapter 11 Reliability engineering

Inputs causing

erroneous outputs

Input set

Ie

Program

Erroneous

outputs

Output set

Oe

Figure 11.2 A system

as an input/output

mapping

System reliability is not an absolute value—it depends on where and how

that

system is used. For example, let’s say that you measure the reliability of an

application in an office environment where most users are uninterested in

the operation of

the software. They follow the instructions for its use and do not try to

experiment

with the system. If you then measure the reliability of the same system in

a university environment, then the reliability may be quite different. Here,

students may explore the boundaries of the system and use it in

unexpected ways. This may result in system failures that did not occur in

the more constrained office environment. Therefore, the perceptions of the

system’s reliability in each of these environments are different.

The above definition of reliability is based on the idea of failure-free

operation,

where failures are external events that affect the users of a system. But

what constitutes “failure”? A technical definition of failure is behavior

that does not conform to the system’s specification. However, there are

two problems with this definition:

1. Software specifications are often incomplete or incorrect, and it is left

to software engineers to interpret how the system should behave. As they

are not

domain experts, they may not implement the behavior that users expect.

The

software may behave as specified, but, for users, it is still failing.

2. No one except system developers reads software specification

documents. Users

may therefore anticipate that the software should behave in one way

when the

specification says something completely different.

Failure is therefore not something that can be objectively defined. Rather,

it is a

judgment made by users of a system. This is one reason why users do not

all have the same impression of a system’s reliability.

To understand why reliability is different in different environments, we

need to think about a system as an input/output mapping. Figure 11.2

shows a software system that

11.1 Availability and reliability 311

Possible

inputs

User

Erroneous

1

inputs

User

User

3

2

Figure 11.3 Software

usage patterns

links a set of inputs with a set of outputs. Given an input or input

sequence, the program responds by producing a corresponding output. For

example, given an input of a URL,

a web browser produces an output that is the display of the requested web

page.

Most inputs do not lead to system failure. However, some inputs or input

combi-

nations, shown in the shaded ellipse Ie in Figure 11.2, cause system

failures or

erroneous outputs to be generated. The program’s reliability depends on

the number

of system inputs that are members of the set of inputs that lead to an

erroneous

output—in other words, the set of inputs that cause faulty code to be

executed and

system errors to occur. If inputs in the set Ie are executed by frequently

used parts of the system, then failures will be frequent. However, if the

inputs in Ie are executed by code that is rarely used, then users will hardly

ever see failures.

Faults that affect the reliability of the system for one user may never show

up

under someone else’s mode of working. In Figure 11.3, the set of

erroneous inputs

corresponds to the ellipse labeled Ie in Figure 11.2. The set of inputs

produced by

User 2 intersects with this erroneous input set. User 2 will therefore

experience some system failures. User 1 and User 3, however, never use

inputs from the erroneous

set. For them, the software will always appear to be reliable.

The availability of a system does not just depend on the number of system

fail-

ures, but also on the time needed to repair the faults that have caused the

failure.

Therefore, if system A fails once a year and system B fails once a month,

then A is

apparently more reliable then B. However, assume that system A takes 6

hours to

restart after a failure, whereas system B takes 5 minutes to restart. The

availability of system B over the year (60 minutes of down time) is much

better than that of system

A (360 minutes of downtime).

Furthermore, the disruption caused by unavailable systems is not reflected

in the

simple availability metric that specifies the percentage of time that the

system is

available. The time when the system fails is also important. If a system is

unavailable for an hour each day between 3 am and 4 am, this may not

affect many users.

However, if the same system is unavailable for 10 minutes during the

working day,

system unavailability has a much greater effect on users.

Reliability and availability are closely related, but sometimes one is more

impor-

tant than the other. If users expect continuous service from a system, then

the system

312 Chapter 11 Reliability engineering

has a high-availability requirement. It must be available whenever a

demand is

made. However, if a system can recover quickly from failures without loss

of user

data, then these failures may not significantly affect system users.

A telephone exchange switch that routes phone calls is an example of a

system

where availability is more important than reliability. Users expect to be

able to make a call when they pick up a phone or activate a phone app, so

the system has high-availability requirements. If a system fault occurs

while a connection is being set up, this is often quickly recoverable.

Exchange or base station switches can reset the system and retry the

connection attempt. This can be done quickly, and phone users may not

even notice that a failure has occurred. Furthermore, even if a call is

interrupted, the consequences are usually not serious. Users simply

reconnect if this happens.

11.2 Reliability requirements

In September 1993, a plane landed at Warsaw Airport in Poland during a

thunder-

storm. For 9 seconds after landing, the brakes on the computer-controlled

braking

system did not work. The braking system had not recognized that the

plane had

landed and assumed that the aircraft was still airborne. A safety feature on

the aircraft had stopped the deployment of the reverse thrust system,

which slows down the

aircraft, because reverse thrust is catastrophic if the plane is in the air. The

plane ran off the end of the runway, hit an earth bank, and caught fire.

The inquiry into the accident showed that the braking system software

had oper-

ated according to its specification. There were no errors in the control

system.

However, the software specification was incomplete and had not taken

into account

a rare situation, which arose in this case. The software worked, but the

system failed.

This incident shows that system dependability does not just depend on

good engi-

neering. It also requires attention to detail when the system requirements

are derived and the specification of software requirements that are geared

to ensuring the

dependability of a system. Those dependability requirements are of two

types:

1. Functional requirements, which define checking and recovery facilities

that should be included in the system and features that provide protection

against

system failures and external attacks.

2. Non-functional requirements, which define the required reliability and

availability of the system.

As I discussed in Chapter 10, the overall reliability of a system depends on

the

hardware reliability, the software reliability, and the reliability of the

system operators. The system software has to take this requirement into

account. As well as

including requirements that compensate for software failure, there may

also be

related reliability requirements to help detect and recover from hardware

failures

and operator errors.

11.2 Reliability requirements 313

Availability

Explanation

0.9

The system is available for 90% of the time. This means

that, in a 24-hour period (1440 minutes), the system

will be unavailable for 144 minutes.

0.99

In a 24-hour period, the system is unavailable for 14.4

minutes.

0.999

The system is unavailable for 84 seconds in a 24-hour

period.

0.9999

The system is unavailable for 8.4 seconds in a 24-hour

Figure 11.4 Availability

period—roughly, one minute per week.

specification

11.2.1 Reliability metrics

Reliability can be specified as a probability that a system failure will occur

when a system is in use within a specified operating environment. If you

are willing to

accept, for example, that 1 in any 1000 transactions may fail, then you

can specify

the failure probability as 0.001. This doesn’t mean that there will be

exactly 1 failure in every 1000 transactions. It means that if you observe N

thousand transactions, the number of failures that you observe should be

about N.

Three metrics may be used to specify reliability and availability:

1. Probability of failure on demand ( POFOD) If you use this metric, you

define the probability that a demand for service from a system will result

in a system

failure. So, POFOD = 0.001 means that there is a 1/1000 chance that a

failure

will occur when a demand is made.

2. Rate of occurrence of failures ( ROCOF) This metric sets out the probable

number of system failures that are likely to be observed relative to a

certain time

period (e.g., an hour), or to the number of system executions. In the

example

above, the ROCOF is 1/1000. The reciprocal of ROCOF is the mean time to

failure (MTTF), which is sometimes used as a reliability metric. MTTF is

the average number of time units between observed system failures. A

ROCOF of

two failures per hour implies that the mean time to failure is 30 minutes.

3. Availability (AVAIL) AVAIL is the probability that a system will be

operational when a demand is made for service. Therefore, an availability

of 0.9999 means

that, on average, the system will be available for 99.99% of the operating

time.

Figure 11.4 shows what different levels of availability mean in practice.

POFOD should be used in situations where a failure on demand can lead

to a serious

system failure. This applies irrespective of the frequency of the demands.

For example, a protection system that monitors a chemical reactor and

shuts down the reaction if it is overheating should have its reliability

specified using POFOD. Generally, demands on a protection system are

infrequent as the system is a last line of defense, after all other recovery

strategies have failed. Therefore a POFOD of 0.001 (1 failure in 1000

demands)

314 Chapter 11 Reliability engineering

might seem to be risky. However, if there are only two or three demands

on the system in its entire lifetime, then the system is unlikely to ever fail.

ROCOF should be used when demands on systems are made regularly

rather than

intermittently. For example, in a system that handles a large number of

transactions, you may specify a ROCOF of 10 failures per day. This means

that you are willing to

accept that an average of 10 transactions per day will not complete

successfully and will have to be canceled and resubmitted. Alternatively,

you may specify ROCOF as

the number of failures per 1000 transactions.

If the absolute time between failures is important, you may specify the

reliability

as the mean time to failures (MTTF). For example, if you are specifying

the required reliability for a system with long transactions (such as a

computer-aided design system), you should use this metric. The MTTF

should be much longer than the average

time that a user works on his or her models without saving the user’s

results. This

means that users are unlikely to lose work through a system failure in any

one session.

11.2.2 Non-functional reliability requirements

Non-functional reliability requirements are specifications of the required

reliability and availability of a system using one of the reliability metrics

(POFOD, ROCOF, or

AVAIL) described in the previous section. Quantitative reliability and

availability

specification has been used for many years in safety-critical systems but is

uncom-

mon for business critical systems. However, as more and more companies

demand

24/7 service from their systems, it makes sense for them to be precise

about their

reliability and availability expectations.

Quantitative reliability specification is useful in a number of ways:

1. The process of deciding the required level of the reliability helps to

clarify what stakeholders really need. It helps stakeholders understand

that there are different types of system failure, and it makes clear to them

that high levels of reliability

are expensive to achieve.

2. It provides a basis for assessing when to stop testing a system. You stop

when

the system has reached its required reliability level.

3. It is a means of assessing different design strategies intended to improve

the reliability of a system. You can make a judgment about how each

strategy might lead

to the required levels of reliability.

4. If a regulator has to approve a system before it goes into service (e.g.,

all systems that are critical to flight safety on an aircraft are regulated),

then evidence that a required reliability target has been met is important

for system certification.

To avoid incurring excessive and unnecessary costs, it is important that

you spec-

ify the reliability that you really need rather than simply choose a very

high level of reliability for the whole system. You may have different

requirements for different

11.2 Reliability requirements 315

Overspecification of reliability

Overspecification of reliability means defining a level of required

reliability that is higher than really necessary for the practical operation of

the software. Overspecification of reliability increases development costs

disproportionately. The reason for this is that the costs of reducing faults

and verifying reliability increase exponentially as reliability increases

http://software-engineering-book.com/web/over-specifying-reliability/

parts of the system if some parts are more critical than others. You should

follow

these three guidelines when specifying reliability requirements:

1. Specify the availability and reliability requirements for different types

of failure. There should be a lower probability of high-cost failures than

failures that

don’t have serious consequences.

2. Specify the availability and reliability requirements for different types

of system service. Critical system services should have the highest

reliability but you may

be willing to tolerate more failures in less critical services. You may decide

that

it is only cost-effective to use quantitative reliability specification for the

most critical system services.

3. Think about whether high reliability is really required. For example,

you may

use error-detection mechanisms to check the outputs of a system and have

error-

correction processes in place to correct errors. There may then be no need

for a

high level of reliability in the system that generates the outputs as errors

can be

detected and corrected.

To illustrate these guidelines, think about the reliability and availability

requirements for a bank ATM system that dispenses cash and provides

other services to

customers. Banks have two concerns with such systems:

1. To ensure that they carry out customer services as requested and that

they

properly record customer transactions in the account database.

2. To ensure that these systems are available for use when required.

Banks have many years of experience with identifying and correcting

incorrect

account transactions. They use accounting methods to detect when things

have gone

wrong. Most transactions that fail can simply be canceled, resulting in no

loss to the bank and minor customer inconvenience. Banks that run ATM

networks therefore

accept that ATM failures may mean that a small number of transactions

are incor-

rect, but they think it more cost-effective to fix these errors later rather

than incur high costs in avoiding faulty transactions. Therefore, the

absolute reliability required of an ATM may be relatively low. Several

failures per day may be acceptable.

316 Chapter 11 Reliability engineering

For a bank (and for the bank’s customers), the availability of the ATM

network

is more important than whether or not individual ATM transactions fail.

Lack of

availability means increased demand on counter services, customer

dissatisfaction,

engineering costs to repair the network, and so on. Therefore, for

transaction-based systems such as banking and e-commerce systems, the

focus of reliability specification is usually on specifying the availability of

the system.

To specify the availability of an ATM network, you should identify the

system

services and specify the required availability for each of these services,

notably:

the customer account database service; and

the individual services provided by an ATM such as “withdraw cash”

and “provide

account information.”

The database service is the most critical as failure of this service means

that all

of the ATMs in the network are out of action. Therefore, you should

specify this

service to have a high level of availability. In this case, an acceptable

figure for database availability (ignoring issues such as scheduled

maintenance and upgrades)

would probably be around 0.9999, between 7 am and 11 pm. This means

a downtime

of less than 1 minute per week.

For an individual ATM, the overall availability depends on mechanical

reliability

and the fact that it can run out of cash. Software issues are probably less

significant than these factors. Therefore, a lower level of software

availability for the ATM

software is acceptable. The overall availability of the ATM software might

therefore be specified as 0.999, which means that a machine might be

unavailable for between

1 and 2 minutes each day. This allows for the ATM software to be

restarted in the

event of a problem.

The reliability of control systems is usually specified in terms of the

probability

that the system will fail when a demand is made (POFOD). Consider the

reliability

requirements for the control software in the insulin pump, introduced in

Chapter 1.

This system delivers insulin a number of times per day and monitors the

user’s blood glucose several times per hour.

There are two possible types of failure in the insulin pump:

1. Transient software failures, which can be repaired by user actions such as

resetting or recalibrating the machine. For these types of failure, a

relatively

low value of POFOD (say 0.002) may be acceptable. This means that one

failure may occur in every 500 demands made on the machine. This is

approx-

imately once every 3.5 days, because the blood sugar is checked about 5

times per hour.

2. Permanent software failures, which require the software to be reinstalled

by the manufacturer. The probability of this type of failure should be

much lower.

Roughly once a year is the minimum figure, so POFOD should be no more

than 0.00002.

11.2 Reliability requirements 317

RR1: A predefined range for all operator inputs shall be defined, and the

system shall check that all operator inputs fall within this predefined

range. (Checking)

RR2: Copies of the patient database shall be maintained on two separate

servers that are not housed in the same building. (Recovery, redundancy)

RR3: N-version programming shall be used to implement the braking

control system.

(Redundancy)

Figure 11.5 Examples

RR4: The system must be implemented in a safe subset of Ada and

checked using

of functional reliability

static analysis. (Process)

requirements

Failure to deliver insulin does not have immediate safety implications, so

com-

mercial factors rather than safety factors govern the level of reliability

required.

Service costs are high because users need fast repair and replacement. It is

in the

manufacturer’s interest to limit the number of permanent failures that

require repair.

11.2.3 Functional reliability specification

To achieve a high level of reliability and availability in a software-

intensive system, you use a combination of fault-avoidance, fault-

detection, and fault-tolerance techniques. This means that functional

reliability requirements have to be generated which specify how the

system should provide fault avoidance, detection, and tolerance.

These functional reliability requirements should specify the faults to be

detected

and the actions to be taken to ensure that these faults do not lead to

system failures.

Functional reliability specification, therefore, involves analyzing the non-

functional requirements (if these have been specified), assessing the risks

to reliability and

specifying system functionality to address these risks.

There are four types of functional reliability requirements:

1. Checking requirements These requirements identify checks on inputs to

the system to ensure that incorrect or out-of-range inputs are detected

before they are

processed by the system.

2. Recovery requirements These requirements are geared to helping the

system recover after a failure has occurred. These requirements are usually

concerned

with maintaining copies of the system and its data and specifying how to

restore

system services after failure.

3. Redundancy requirements These specify redundant features of the system

that ensure that a single component failure does not lead to a complete

loss of service. I discuss this in more detail in the next chapter.

4. Process requirements These are fault-avoidance requirements, which

ensure that good practice is used in the development process. The

practices specified

should reduce the number of faults in a system.

Some examples of these types of reliability requirement are shown in

Figure 11.5.

318 Chapter 11 Reliability engineering

There are no simple rules for deriving functional reliability requirements.

Organizations that develop critical systems usually have organizational

knowledge

about possible reliability requirements and how these requirements reflect

the actual reliability of a system. These organizations may specialize in

specific types of systems, such as railway control systems, so the reliability

requirements can be reused across a range of systems.

11.3 Fault-tolerant architectures

Fault tolerance is a runtime approach to dependability in which systems

include

mechanisms to continue in operation, even after a software or hardware

fault has

occurred and the system state is erroneous. Fault-tolerance mechanisms

detect and

correct this erroneous state so that the occurrence of a fault does not lead

to a system failure. Fault tolerance is required in systems that are safety or

security critical and where the system cannot move to a safe state when an

error is detected.

To provide fault tolerance, the system architecture has to be designed to

include

redundant and diverse hardware and software. Examples of systems that

may need

fault-tolerant architectures are aircraft systems that must be available

throughout

the duration of the flight, telecommunication systems, and critical

command and

control systems.

The simplest realization of a dependable architecture is in replicated

servers, where two or more servers carry out the same task. Requests for

processing are channeled

through a server management component that routes each request to a

particular

server. This component also keeps track of server responses. In the event

of server

failure, which can be detected by a lack of response, the faulty server is

switched out of the system. Unprocessed requests are resubmitted to other

servers for processing.

This replicated server approach is widely used for transaction processing

systems

where it is easy to maintain copies of transactions to be processed.

Transaction

processing systems are designed so that data is only updated once a

transaction has

finished correctly. Delays in processing do not affect the integrity of the

system. It can be an efficient way of using hardware if the backup server is

one that is normally used for low-priority tasks. If a problem occurs with a

primary server, its unprocessed transactions are transferred to the backup

server, which gives that work the highest priority.

Replicated servers provide redundancy but not usually diversity. The

server

hardware is usually identical, and the servers run the same version of the

software.

Therefore, they can cope with hardware failures and software failures that

are localized to a single machine. They cannot cope with software design

problems that cause

all versions of the software to fail at the same time. To handle software

design failures, a system has to use diverse software and hardware.

Torres-Pomales surveys a range of software fault-tolerance techniques

(Torres-Pomales 2000), and Pullum (Pullum 2001) describes different

types of

fault-tolerant architecture. In the following sections, I describe three

architec-

tural patterns that have been used in fault-tolerant systems.

11.3 Fault-tolerant architectures 319

System environment

Protection

Sensors

sensors

Protection

Control system

system

Actuators

Controlled

equipment

Figure 11.6 Protection

system architecture

11.3.1 Protection systems

A protection system is a specialized system that is associated with some

other sys-

tem. This is usually a control system for some process, such as a chemical

manu-

facturing process, or an equipment control system, such as the system on a

driverless train. An example of a protection system might be a system on a

train

that detects if the train has gone through a red signal. If there is no

indication that the train control system is slowing down the train, then the

protection system automatically applies the train brakes to bring it to a

halt. Protection systems independently monitor their environment. If

sensors indicate a problem that the controlled

system is not dealing with, then the protection system is activated to shut

down the process or equipment.

Figure 11.6 illustrates the relationship between a protection system and a

con-

trolled system. The protection system monitors both the controlled

equipment and

the environment. If a problem is detected, it issues commands to the

actuators to shut down the system or invoke other protection mechanisms

such as opening a pressure-release valve. Notice that there are two sets of

sensors. One set is used for normal system monitoring and the other

specifically for the protection system. In the event of sensor failure,

backups are in place that will allow the protection system to continue in

operation. The system may also have redundant actuators.

A protection system only includes the critical functionality that is required

to

move the system from a potentially unsafe state to a safe state (which

could be sys-

tem shutdown). It is an instance of a more general fault-tolerant

architecture in which a principal system is supported by a smaller and

simpler backup system that only

includes essential functionality. For example, the control software for the

U.S. Space Shuttle had a backup system with “get you home”

functionality. That is, the backup

system could land the vehicle if the principal control system failed but had

no other control functions.

320 Chapter 11 Reliability engineering

Status

Channel 1

Input value

Splitter

Comparator

Output value

Channel 2

Figure 11.7 Self-

monitoring architecture

The advantage of this architectural style is that protection system software

can be

much simpler than the software that is controlling the protected process.

The only

function of the protection system is to monitor operation and to ensure

that the system is brought to a safe state in the event of an emergency.

Therefore, it is possible to invest more effort in fault avoidance and fault

detection. You can check that the software specification is correct and

consistent and that the software is correct with respect to its specification.

The aim is to ensure that the reliability of the protection system is such

that it has a very low probability of failure on demand (say, 0.001).

Given that demands on the protection system should be rare, a probability

of failure on demand of 1/1000 means that protection system failures

should be very rare.

11.3.2 Self-monitoring architectures

A self-monitoring architecture (Figure 11.7) is a system architecture in

which the system is designed to monitor its own operation and to take

some action if a problem is detected. Computations are carried out on

separate channels, and the outputs of these computations are compared. If

the outputs are identical and are available at the same time, then the

system is judged to be operating correctly. If the outputs are different,

then a failure is assumed. When this occurs, the system raises a failure

exception on the status output line. This signals that control should be

transferred to some other system.

To be effective in detecting both hardware and software faults, self-

monitoring

systems have to be designed so that:

1. The hardware used in each channel is diverse. In practice, this might

mean that

each channel uses a different processor type to carry out the required

computa-

tions, or the chipset making up the system may be sourced from different

manu-

facturers. This reduces the probability of common processor design faults

affecting the computation.

2. The software used in each channel is diverse. Otherwise, the same

software

error could arise at the same time on each channel.

On its own, this architecture may be used in situations where it is

important for

computations to be correct, but where availability is not essential. If the

answers

11.3 Fault-tolerant architectures 321

Primary flight control system 1

Input

Output

Channel 1

Status

Output

Filter

Splitter

Comparator

Channel 2

Status

Primary flight control system 2

Output

Filter

Status

Primary flight control system 3

Output

Filter

Secondary flight control system 1

Status

Channel 1

Filter

Output

Splitter

Comparator

Channel 2

Status

Secondary flight control system 2

Output

Filter

Figure 11.8 The

from each channel differ, the system shuts down. For many medical

treatment and

Airbus flight control

diagnostic systems, reliability is more important than availability because

an incor-system architecture

rect system response could lead to the patient receiving incorrect

treatment. However, if the system shuts down in the event of an error, this

is an inconvenience but the

patient will not usually be harmed.

In situations that require high availability, you have to use several self-

checking

systems in parallel. You need a switching unit that detects faults and

selects a

result from one of the systems, where both channels are producing a

consistent

response. This approach is used in the flight control system for the Airbus

340

series of aircraft, which uses five self-checking computers. Figure 11.8 is a

simplified diagram of the Airbus flight control system that shows the

organization of the

self-monitoring systems.

In the Airbus flight control system, each of the flight control computers

carries out the computations in parallel, using the same inputs. The

outputs are connected to

hardware filters that detect if the status indicates a fault and, if so, that

the output from that computer is switched off. The output is then taken

from an alternative system.

Therefore, it is possible for four computers to fail and for the aircraft

operation to continue. In more than 15 years of operation, there have been

no reports of situations where control of the aircraft has been lost due to

total flight control system failure.

322 Chapter 11 Reliability engineering

A1

Input

Output

A2

selector

Figure 11.9 Triple

A3

modular redundancy

The designers of the Airbus system have tried to achieve diversity in a

number of

different ways:

1. The primary flight control computers use a different processor from the

second-

ary flight control systems.

2. The chipset that is used in each channel in the primary and secondary

systems is supplied by a different manufacturer.

3. The software in the secondary flight control systems provides critical

function-

ality only—it is less complex than the primary software.

4. The software for each channel in both the primary and the secondary

systems is

developed using different programming languages and by different teams.

5. Different programming languages are used in the secondary and

primary systems.

As I discuss in Section 11.3.4, these do not guarantee diversity but they

reduce

the probability of common failures in different channels.

11.3.3 N-version programming

Self-monitoring architectures are examples of systems in which

multiversion

programming is used to provide software redundancy and diversity. This

notion of

multiversion programming has been derived from hardware systems

where the

notion of triple modular redundancy (TMR) has been used for many years

to build

systems that are tolerant of hardware failures (Figure 11.9).

In a TMR system, the hardware unit is replicated three (or sometimes

more)

times. The output from each unit is passed to an output comparator that is

usually

implemented as a voting system. This system compares all of its inputs,

and, if two

or more are the same, then that value is output. If one of the units fails

and does not produce the same output as the other units, its output is

ignored. A fault manager

may try to repair the faulty unit automatically, but if this is impossible,

the system is automatically reconfigured to take the unit out of service.

The system then continues to function with two working units.

This approach to fault tolerance relies on most hardware failures being the

result

of component failure rather than design faults. The components are

therefore likely

11.3 Fault-tolerant architectures 323

Version 1

Input

Output

Version 2

selector

Agreed

result

Version 3

Fault

manager

Figure 11.10 N-version

N software versions

programming

to fail independently. It assumes that, when fully operational, all hardware

units perform to specification. There is therefore a low probability of

simultaneous compo-

nent failure in all hardware units.

Of course, the components could all have a common design fault and thus

all

produce the same (wrong) answer. Using hardware units that have a

common speci-

fication but that are designed and built by different manufacturers reduces

the

chances of such a common mode failure. It is assumed that the probability

of differ-

ent teams making the same design or manufacturing error is small.

A similar approach can be used for fault-tolerant software where N diverse

versions of a software system execute in parallel (Avizienis 1995). This

approach to

software fault tolerance, illustrated in Figure 11.10, has been used in

railway signaling systems, aircraft systems, and reactor protection systems.

Using a common specification, the same software system is implemented

by a

number of teams. These versions are executed on separate computers.

Their outputs

are compared using a voting system, and inconsistent outputs or outputs

that are not produced in time are rejected. At least three versions of the

system should be available so that two versions should be consistent in the

event of a single failure.

N-version programming may be less expensive than self-checking

architectures in systems for which a high level of availability is required.

However, it still requires several different teams to develop different

versions of the software. This leads to very high software development

costs. As a result, this approach is only used in systems where it is

impractical to provide a protection system that can guard against

safety-critical failures.

11.3.4 Software diversity

All of the above fault-tolerant architectures rely on software diversity to

achieve fault tolerance. This is based on the assumption that diverse

implementations of the same

specification (or a part of the specification, for protection systems) are

independent.

They should not include common errors and so will not fail in the same

way, at the

same time. The software should therefore be written by different teams

who should

not communicate during the development process. This requirement

reduces the

chances of common misunderstandings or misinterpretations of the

specification.

324 Chapter 11 Reliability engineering

The company that is procuring the system may include explicit diversity

policies that are intended to maximize the differences between the system

versions. For example:

1. By including requirements that different design methods should be

used. For

example, one team may be required to produce an object-oriented design,

and

another team may produce a function-oriented design.

2. By stipulating that the programs should be implemented using different

pro-

gramming languages. For example, in a three-version system, Ada, C++,

and

Java could be used to write the software versions.

3. By requiring the use of different tools and development environments

for the

system.

4. By requiring different algorithms to be used in some parts of the

implementa-

tion. However, this limits the freedom of the design team and may be

difficult to

reconcile with system performance requirements.

Ideally, the diverse versions of the system should have no dependencies

and so

should fail in completely different ways. If this is the case, then the overall

reliability of a diverse system is obtained by multiplying the reliabilities of

each channel. So, if each channel has a probability of failure on demand of

0.001, then the overall

POFOD of a three-channel system (with all channels independent) is a

million times

greater than the reliability of a single channel system.

In practice, however, achieving complete channel independence is

impossible. It has

been shown experimentally that independent software design teams often

make the

same mistakes or misunderstand the same parts of the specification

(Brilliant, Knight, and Leveson 1990; Leveson 1995). There are several

reasons for this misunderstanding: 1. Members of different teams are often

from the same cultural background and may

have been educated using the same approach and textbooks. This means

that they

may find the same things difficult to understand and have common

difficulties in

communicating with domain experts. It is quite possible that they will,

indepen-

dently, make the same mistakes and design the same algorithms to solve a

problem.

2. If the requirements are incorrect or they are based on

misunderstandings about

the environment of the system, then these mistakes will be reflected in

each

implementation of the system.

3. In a critical system, the detailed system specification that is derived

from the system’s requirements should provide an unambiguous definition

of the system’s behavior. However, if the specification is ambiguous, then

different teams

may misinterpret the specification in the same way.

One way to reduce the possibility of common specification errors is to

develop

detailed specifications for the system independently and to define the

specifications in different languages. One development team might work

from a formal specification,

11.4 Programming for reliability 325

another from a state-based system model, and a third from a natural

language specification. This approach helps avoid some errors of

specification interpretation, but does not get around the problem of

requirements errors. It also introduces the possibility of errors in the

translation of the requirements, leading to inconsistent specifications.

In an analysis of the experiments, Hatton (Hatton 1997) concluded that a

three-channel system was somewhere between 5 and 9 times more reliable

than a single-channel

system. He concluded that improvements in reliability that could be

obtained by devoting more resources to a single version could not match

this and so N-version approaches were more likely to lead to more reliable

systems than single-version approaches.

What is unclear, however, is whether the improvements in reliability from

a mul-

tiversion system are worth the extra development costs. For many systems,

the extra

costs may not be justifiable, as a well-engineered single-version system

may be good enough. It is only in safety- and mission-critical systems,

where the costs of failure are very high, that multiversion software may be

required. Even in such situations

(e.g., a spacecraft system), it may be enough to provide a simple backup

with limited functionality until the principal system can be repaired and

restarted.

11.4 Programming for reliability

I have deliberately focused in this book on programming-language

independent

aspects of software engineering. It is almost impossible to discuss

programming

without getting into the details of a specific programming language.

However, when

considering reliability engineering, there are a set of accepted good

programming

practices that are fairly universal and that help reduce faults in delivered

systems.

A list of eight good practice guidelines is shown in Figure 11.11. They can

be

applied regardless of the particular programming language used for

systems devel-

opment, although the way they are used depends on the specific languages

and nota-

tions that are used for system development. Following these guidelines

also reduces

the chances of introducing security-related vulnerabilities into programs.

Guideline 1: Control the visibility of information in a program

A security principle that is adopted by military organizations is the “need

to know”

principle. Only those individuals who need to know a particular piece of

information in order to carry out their duties are given that information.

Information that is not directly relevant to their work is withheld.

When programming, you should adopt an analogous principle to control

access to the

variables and data structures that you use. Program components should

only be allowed access to data that they need for their implementation.

Other program data should be inaccessible and hidden from them. If you

hide information, it cannot be corrupted by program components that are

not supposed to use it. If the interface remains the same, the data

representation may be changed without affecting other components in the

system.

326 Chapter 11 Reliability engineering

Dependable programming guidelines

1. Limit the visibility of information in a program.

2. Check all inputs for validity.

3. Provide a handler for all exceptions.

4. Minimize the use of error-prone constructs.

5. Provide restart capabilities.

6. Check array bounds.

Figure 11.11 Good

7. Include timeouts when calling external components.

practice guidelines for

8. Name all constants that represent real-world values.

dependable

programming

You can achieve this by implementing data structures in your program as

abstract

data types. An abstract data type is one in which the internal structure and

representation of a variable of that type are hidden. The structure and

attributes of the type are not externally visible, and all access to the data

is through operations.

For example, you might have an abstract data type that represents a queue

of

requests for service. Operations should include get and put, which add and

remove

items from the queue, and an operation that returns the number of items

in the queue.

You might initially implement the queue as an array but subsequently

decide to

change the implementation to a linked list. This can be achieved without

any changes to code using the queue, because the queue representation is

never directly accessed.

In some object-oriented languages, you can implement abstract data types

using

interface definitions, where you declare the interface to an object without

reference to its implementation. For example, you can define an interface

Queue, which supports methods to place objects onto the queue, remove

them from the queue, and

query the size of the queue. In the object class that implements this

interface, the attributes and methods should be private to that class.

Guideline 2: Check all inputs for validity

All programs take inputs from their environment and process them. The

specification

makes assumptions about these inputs that reflect their real-world use. For

example, it may be assumed that a bank account number is always an

eight-digit positive integer. In many cases, however, the system

specification does not define what actions should be taken if the input is

incorrect. Inevitably, users will make mistakes and will sometimes enter

the wrong data. As I discuss in Chapter 13, malicious attacks on a system

may rely on deliberately entering invalid information. Even when inputs

come from sensors or other systems, these systems can go wrong and

provide incorrect values.

You should therefore always check the validity of inputs as soon as they

are read

from the program’s operating environment. The checks involved obviously

depend

on the inputs themselves, but possible checks that may be used are:

1. Range checks You may expect inputs to be within a particular range. For

example, an input that represents a probability should be within the range

0.0 to 1.0;

an input that represents the temperature of a liquid water should be

between 0

degrees Celsius and 100 degrees Celsius, and so on.

11.4 Programming for reliability 327

2. Size checks You may expect inputs to be a given number of characters,

for example, 8 characters to represent a bank account. In other cases, the

size may

not be fixed, but there may be a realistic upper limit. For example, it is

unlikely

that a person’s name will have more than 40 characters.

3. Representation checks You may expect an input to be of a particular type,

which is represented in a standard way. For example, people’s names do

not include

numeric characters, email addresses are made up of two parts, separated

by a @

sign, and so on.

4. Reasonableness checks Where an input is one of a series and you know

something about the relationships between the members of the series, then

you can

check that an input value is reasonable. For example, if the input value

repre-

sents the readings of a household electricity meter, then you would expect

the

amount of electricity used to be approximately the same as in the

corresponding

period in the previous year. Of course, there will be variations. but order

of

magnitude differences suggest that something has gone wrong.

The actions that you take if an input validation check fails depend on the

type

of system being implemented. In some cases, you report the problem to

the user

and request that the value is re-input. Where a value comes from a sensor,

you

might use the most recent valid value. In embedded real-time systems, you

might

have to estimate the value based on previous data, so that the system can

continue

in operation.

Guideline 3: Provide a handler for all exceptions

During program execution, errors or unexpected events inevitably occur.

These may

arise because of a program fault, or they may be a result of unpredictable

external

circumstances. An error or an unexpected event that occurs during the

execution of a program is called an exception. Examples of exceptions

might be a system power

failure, an attempt to access nonexistent data, or numeric overflow or

underflow.

Exceptions may be caused by hardware or software conditions. When an

excep-

tion occurs, it must be managed by the system. This can be done within

the program

itself, or it may involve transferring control to a system exception-

handling mecha-

nism. Typically, the system’s exception management mechanism reports

the error

and shuts down execution. Therefore, to ensure that program exceptions

do not

cause system failure, you should define an exception handler for all

possible excep-

tions that may arise; you should also make sure that all exceptions are

detected and explicitly handled.

Languages such as Java, C++, and Python have built-in exception-

handling

constructs. When an exceptional situation occurs, the exception is signaled

and the

language runtime system transfers control to an exception handler. This is

a code

section that states exception names and appropriate actions to handle each

exception (Figure 11.12). The exception handler is outside the normal

flow of control, and this normal control flow does not resume after the

exception has been handled.

328 Chapter 11 Reliability engineering

Code section

Normal flow

of control

Exception detected

Normal exit

Exception

processing

Figure 11.12 Exception

handling

Exception-handling code

An exception handler usually does one of three things:

1. Signals to a higher-level component that an exception has occurred and

pro-

vides information to that component about the type of exception. You use

this

approach when one component calls another and the calling component

needs to

know if the called component has executed successfully. If not, it is up to

the

calling component to take action to recover from the problem.

2. Carries out some alternative processing to that which was originally

intended.

Therefore, the exception handler takes some actions to recover from the

prob-

lem. Processing may then continue as normal. Alternatively, the exception

han-

dler may indicate that an exception has occurred so that a calling

component is

aware of and can deal with the exception.

3. Passes control to the programming language runtime support system

that han-

dles the exception. This is often the default when faults occur in a

program, for

example, when a numeric value overflows. The usual action of the runtime

sys-

tem is to halt processing. You should only use this approach when it is

possible

to move the system to a safe and quiescent state, before handing over

control to

the runtime system.

Handling exceptions within a program makes it possible to detect and

recover

from some input errors and unexpected external events. As such, it

provides a degree of fault tolerance. The program detects faults and can

take action to recover from

them. As most input errors and unexpected external events are usually

transient, it is often possible to continue normal operation after the

exception has been processed.

Guideline 4: Minimize the use of error-prone constructs

Faults in programs, and therefore many program failures, are usually a

consequence

of human error. Programmers make mistakes because they lose track of

the numer-

ous relationships between the state variables. They write program

statements that

result in unexpected behavior and system state changes. People will

always make

11.4 Programming for reliability 329

Error-prone constructs

Some programming language features are more likely than others to lead

to the introduction of program bugs.

Program reliability is likely to be improved if you avoid using these

constructs. Wherever possible, you should minimize the use of go to

statements, floating-point numbers, pointers, dynamic memory allocation,

parallel-ism, recursion, interrupts, aliasing, unbounded arrays, and default

input processing.

http://software-engineering-book.com/web/error-prone-constructs/

mistakes, but in the late 1960s it became clear that some approaches to

programming

were more likely to introduce errors into a program than others.

For example, you should try to avoid using floating-point numbers

because the

precision of floating point numbers is limited by their hardware

representation.

Comparisons of very large or very small numbers are unreliable. Another

construct

that is potentially error-prone is dynamic storage allocation where you

explicitly

manage storage in the program. It is very easy to forget to release storage

when it’s no longer needed, and this can lead to hard to detect runtime

errors.

Some standards for safety-critical systems development completely

prohibit the

use of error-prone constructs. However, such an extreme position is not

normally

practical. All of these constructs and techniques are useful, though they

must be used with care. Wherever possible, their potentially dangerous

effects should be controlled by using them within abstract data types or

objects. These act as natural “firewalls” limiting the damage caused if

errors occur.

Guideline 5: Provide restart capabilities

Many organizational information systems are based on short transactions

where pro-

cessing user inputs takes a relatively short time. These systems are

designed so that changes to the system’s database are only finalized after

all other processing has been successfully completed. If something goes

wrong during processing, the database is

not updated and so does not become inconsistent. Virtually all e-

commerce systems,

where you only commit to your purchase on the final screen, work in this

way.

User interactions with e-commerce systems usually last a few minutes and

involve minimal processing. Database transactions are short and are

usually com-

pleted in less than a second. However, other types of system such as CAD

systems

and word processing systems involve long transactions. In a long

transaction system, the time between starting to use the system and

finishing work may be several minutes or hours. If the system fails during

a long transaction, then all of the work may be lost. Similarly, in

computationally intensive systems such as some e-science systems,

minutes or hours of processing may be required to complete the

computation.

All of this time is lost in the event of a system failure.

In all of these types of systems, you should provide a restart capability

that is

based on keeping copies of data collected or generated during processing.

The restart facility should allow the system to restart using these copies,

rather than having to

330 Chapter 11 Reliability engineering

start all over from the beginning. These copies are sometimes called

checkpoints.

For example:

1. In an e-commerce system, you can keep copies of forms filled in by a

user and

allow them to access and submit these forms without having to fill them in

again.

2. In a long transaction or computationally intensive system, you can

automatically save data every few minutes and, in the event of a system

failure, restart with the

most recently saved data. You should also allow for user error and provide

a way

for users to go back to the most recent checkpoint and start again from

there.

If an exception occurs and it is impossible to continue normal operation,

you can

handle the exception using backward error recovery. This means that you

reset the state of the system to the saved state in the checkpoint and

restart operation from that point.

Guideline 6: Check array bounds

All programming languages allow the specification of arrays—sequential

data struc-

tures that are accessed using a numeric index. These arrays are usually

laid out in

contiguous areas within the working memory of a program. Arrays are

specified to

be of a particular size, which reflects how they are used. For example, if

you wish to represent the ages of up to 10,000 people, then you might

declare an array with

10,000 locations to hold the age data.

Some programming languages, such as Java, always check that when a

value is

entered into an array, the index is within that array. So, if an array A is

indexed from 0 to 10,000, an attempt to enter values into elements A [-5]

or A [12345] will lead to an exception being raised. However,

programming languages such as C and C++ do not

automatically include array bound checks and simply calculate an offset

from the beginning of the array. Therefore, A [12345] would access the

word that was 12345 locations from the beginning of the array,

irrespective of whether or not this was part of the array.

These languages do not include automatic array bound checking because

this

introduces an overhead every time the array is accessed and so it increases

program

execution time. However, the lack of bound checking leads to security

vulnerabili-

ties, such as buffer overflow, which I discuss in Chapter 13. More

generally, it introduces a system vulnerability that can lead to system

failure. If you are using a

language such as C or C++ that does not include array bound checking,

you should

always include checks that the array index is within bounds.

Guideline 7: Include timeouts when calling external components

In distributed systems, components of the system execute on different

computers, and calls are made across the network from component to

component. To receive some

service, component A may call component B. A waits for B to respond

before con-

tinuing execution. However, if component B fails to respond for some

reason, then

component A cannot continue. It simply waits indefinitely for a response.

A person

11.5 Reliability measurement 331

who is waiting for a response from the system sees a silent system failure,

with no

response from the system. They have no alternative but to kill the waiting

process and restart the system.

To avoid this prospect, you should always include timeouts when calling

external

components. A timeout is an automatic assumption that a called

component has

failed and will not produce a response. You define a time period during

which you

expect to receive a response from a called component. If you have not

received a

response in that time, you assume failure and take back control from the

called com-

ponent. You can then attempt to recover from the failure or tell the system

users

what has happened and allow them to decide what to do.

Guideline 8: Name all constants that represent real-world values

All nontrivial programs include a number of constant values that represent

the values of real-world entities. These values are not modified as the

program executes. Sometimes, these are absolute constants and never

change (e.g., the speed of light), but more often they are values that

change relatively slowly over time. For example, a program to

calculate personal tax will include constants that are the current tax rates.

These change from year to year, and so the program must be updated with

the new constant values.

You should always include a section in your program in which you name

all real-

world constant values that are used. When using the constants, you should

refer to

them by name rather than by their value. This has two advantages as far

as depend-

ability is concerned:

1. You are less likely to make mistakes and use the wrong value. It is easy

to mistype a number, and the system will often be unable to detect a

mistake. For example,

say a tax rate is 34%. A simple transposition error might lead to this being

mistyped as 43%. However, if you mistype a name (such as Standard- tax-

rate),

this error can be detected by the compiler as an undeclared variable.

2. When a value changes, you do not have to look through the whole

program to

discover where you have used that value. All you need do is to change the

value

associated with the constant declaration. The new value is then

automatically

included everywhere that it is needed.

11.5 Reliability measurement

To assess the reliability of a system, you have to collect data about its

operation. The data required may include:

1. The number of system failures given a number of requests for system

services.

This is used to measure the POFOD and applies irrespective of the time

over

which the demands are made.

332 Chapter 11 Reliability engineering

Identify

Compute

Prepare test

Apply tests to

Figure 11.13 Statistical

operational

observed

dataset

system

testing for reliability

profiles

reliability

measurement

2. The time or the number of transactions between system failures plus the

total

elapsed time or total number of transactions. This is used to measure

ROCOF

and MTTF.

3. The repair or restart time after a system failure that leads to loss of

service. This is used in the measurement of availability. Availability does

not just depend on

the time between failures but also on the time required to get the system

back

into operation.

The time units that may be used in these metrics are calendar time or a

discrete

unit such as number of transactions. You should use calendar time for

systems that

are in continuous operation. Monitoring systems, such as process control

systems,

fall into this category. Therefore, the ROCOF might be the number of

failures per

day. Systems that process transactions such as bank ATMs or airline

reservation

systems have variable loads placed on them depending on the time of day.

In these

cases, the unit of “time” used could be the number of transactions; that is,

the

ROCOF would be number of failed transactions per N thousand

transactions.

Reliability testing is a statistical testing process that aims to measure the

reliability of a system. Reliability metrics such as POFOD, the probability

of failure on

demand, and ROCOF, the rate of occurrence of failure, may be used to

quantita-

tively specify the required software reliability. You can check on the

reliability testing process if the system has achieved that required

reliability level.

The process of measuring the reliability of a system is sometimes called

statistical testing (Figure 11.13). The statistical testing process is explicitly

geared to reliability measurement rather than fault finding. Prowell et al .

(Prowell et al. 1999) give a good description of statistical testing in their

book on Cleanroom software engineering.

There are four stages in the statistical testing process:

1. You start by studying existing systems of the same type to understand

how these

are used in practice. This is important as you are trying to measure the

reliability as experienced by system users. Your aim is to define an

operational profile. An

operational profile identifies classes of system inputs and the probability

that

these inputs will occur in normal use.

2. You then construct a set of test data that reflect the operational profile.

This means that you create test data with the same probability distribution

as the test

data for the systems that you have studied. Normally, you use a test data

genera-

tor to support this process.

3. You test the system using these data and count the number and type of

failures

that occur. The times of these failures are also logged. As I discussed in

Chapter

10, the time units chosen should be appropriate for the reliability metric

used.

11.5 Reliability measurement 333

4. After you have observed a statistically significant number of failures,

you

can compute the software reliability and work out the appropriate

reliability

metric value.

This conceptually attractive approach to reliability measurement is not

easy to

apply in practice. The principal difficulties that arise are due to:

1. Operational profile uncertainty The operational profiles based on

experience with other systems may not be an accurate reflection of the

real use of the system.

2. High costs of test data generation It can be very expensive to generate the

large volume of data required in an operational profile unless the process

can be

totally automated.

3. Statistical uncertainty when high reliability is specified You have to

generate a statistically significant number of failures to allow accurate

reliability measurements. When the software is already reliable, relatively

few failures occur and it

is difficult to generate new failures.

4. Recognizing failure It is not always obvious whether or not a system

failure has occurred. If you have a formal specification, you may be able

to identify devia-tions from that specification, but, if the specification is in

natural language,

there may be ambiguities that mean observers could disagree on whether

the

system has failed.

By far the best way to generate the large dataset required for reliability

measure-

ment is to use a test data generator, which can be set up to automatically

generate

inputs matching the operational profile. However, it is not usually possible

to automate the production of all test data for interactive systems because

the inputs are

often a response to system outputs. Datasets for these systems have to be

generated

manually, with correspondingly higher costs. Even where complete

automation is

possible, writing commands for the test data generator may take a

significant amount of time.

Statistical testing may be used in conjunction with fault injection to gather

data

about how effective the process of defect testing has been. Fault injection

(Voas and McGraw 1997) is the deliberate injection of errors into a

program. When the program is executed, these lead to program faults and

associated failures. You then

analyze the failure to discover if the root cause is one of the errors that

you have added to the program. If you find that X% of the injected faults

lead to failures, then proponents of fault injection argue that this suggests

that the defect testing process will also have discovered X% of the actual

faults in the program.

This approach assumes that the distribution and type of injected faults

reflect the

actual faults in the system. It is reasonable to think that this might be true

for faults due to programming errors, but it is less likely to be true for

faults resulting from requirements or design problems. Fault injection is

ineffective in predicting the

number of faults that stem from anything but programming errors.

334 Chapter 11 Reliability engineering

Reliability growth modeling

A reliability growth model is a model of how the system reliability

changes over time during the testing process.

As system failures are discovered, the underlying faults causing these

failures are repaired so that the reliability of the system should improve

during system testing and debugging. To predict reliability, the conceptual

reliability growth model must then be translated into a mathematical

model.

http://software-engineering-book.com/web/reliability-growth-modeling/

11.5.1 Operational profiles

The operational profile of a software system reflects how it will be used in

practice.

It consists of a specification of classes of input and the probability of their

occurrence. When a new software system replaces an existing automated

system, it is

reasonably easy to assess the probable pattern of usage of the new

software. It should correspond to the existing usage, with some allowance

made for the new functionality that is (presumably) included in the new

software. For example, an operational

profile can be specified for telephone switching systems because

telecommunication

companies know the call patterns that these systems have to handle.

Typically, the operational profile is such that the inputs that have the

highest

probability of being generated fall into a small number of classes, as

shown on the

left of Figure 11.14. There are many classes where inputs are highly

improbable but

not impossible. These are shown on the right of Figure 11.14. The ellipsis

(. . .)

means that there are many more of these uncommon inputs than are

shown.

Musa (Musa 1998) discusses the development of operational profiles in

tele-

communication systems. As there is a long history of collecting usage data

in that

domain, the process of operational profile development is relatively

straightfor-

ward. It simply reflects the historical usage data. For a system that

required about 15 person-years of development effort, an operational

profile was developed in

about 1 person-month. In other cases, operational profile generation took

longer

(2–3 person-years), but the cost was spread over a number of system

releases.

When a software system is new and innovative, however, it is difficult to

antici-

pate how it will be used. Consequently, it is practically impossible to

create an accurate operational profile. Many different users with different

expectations,

backgrounds, and experience may use the new system. There is no

historical usage

database. These users may make use of systems in ways that the system

developers

did not anticipate.

Developing an accurate operational profile is certainly possible for some

types of

system, such as telecommunication systems, that have a standardized

pattern of use.

However, for other types of system, developing an accurate operational

profile may

be difficult or impossible:

1. A system may have many different users who each have their own ways

of

using the system. As I explained earlier in this chapter, different users

have

Chapter 11 Key points 335

Number of inputs

Figure 11.14

Distribution of inputs in

...

an operational profile

Input classes

different impressions of reliability because they use a system in different

ways.

It is difficult to match all of these patterns of use in a single operational

profile.

2. Users change the ways that they use a system over time. As users learn

about

a new system and become more confident with it, they start to use it in

more

sophisticated ways. Therefore, an operational profile that matches the

initial

usage pattern of a system may not be valid after users become familiar

with

the system.

For these reasons, it is often impossible to develop a trustworthy

operational pro-

file. If you use an out-of-date or incorrect operational profile, you cannot

be confident about the accuracy of any reliability measurements that you

make.

K e y P o i n t s

Software reliability can be achieved by avoiding the introduction of

faults, by detecting and removing faults before system deployment, and by

including fault-tolerance facilities that allow the system to remain

operational after a fault has caused a system failure.

Reliability requirements can be defined quantitatively in the system

requirements specification.

Reliability metrics include probability of failure on demand (POFOD), rate

of occurrence of failure (ROCOF), and availability (AVAIL).

Functional reliability requirements are requirements for system

functionality, such as checking and redundancy requirements, which help

the system meet its non-functional reliability requirements.

336

336 Chapter

Chapter 11 Reliability

Reliability engineering

Dependable system architectures are system architectures that are

designed for fault tolerance.

A number of architectural styles support fault tolerance, including

protection systems, self-monitoring architectures, and N-version

programming.

Software diversity is difficult to achieve because it is practically

impossible to ensure that each version of the software is truly

independent.

Dependable programming relies on including redundancy in a program

as checks on the validity of inputs and the values of program variables.

Statistical testing is used to estimate software reliability. It relies on

testing the system with test data that matches an operational profile,

which reflects the distribution of inputs to the software when it is in use.

F u R t h e R R e A d i n g

Software Fault Tolerance Techniques and Implementation. A comprehensive

discussion of techniques to achieve software fault tolerance and fault-

tolerant architectures. The book also covers general issues of software

dependability. Reliability engineering is a mature area, and the techniques

discussed here are still current. (L. L. Pullum, Artech House, 2001).

Software Reliability Engineering: A Roadmap.” This survey paper by a

leading researcher in software reliability summarizes the state of the art in

software reliability engineering and discusses research challenges in this

area. (M. R. Lyu, Proc. Future of Software Engineering, IEEE Computer

Society, 2007) http://dx.doi.org/10.1109/FOSE.2007.24

“Mars Code.” This paper discusses the approach to reliability engineering

used in the development of software for the Mars Curiosity Rover. This

relied on the use of good programming practice, redundancy, and model

checking (covered in Chapter 12). (G. J. Holzmann, Comm. ACM., 57 (2),

2014) http://dx.doi.org/10.1145/2560217.2560218

W e b s i t e

PowerPoint slides for this chapter:

www.pearsonglobaleditions.com/Sommerville

Links to supporting videos:

http://software-engineering-book.com/videos/reliability-and-safety/

More information on the Airbus flight control system:

http://software-engineering-book.com/case-studies/airbus-340/

11.5 Reliability

Chapter measurement

11

Exercises 337

E x E r c i s E s

11.1.

Explainwhyitispracticallyimpossibletovalidatereliabilityspecificationswhenthesea

expressedintermsofaverysmallnumberoffailuresoverthetotallifetimeofasystem.

11.2.

Suggestappropriatereliabilitymetricsfortheclassesofsoftwaresystembelow.Giverea

sonsforyourchoiceofmetric.Predicttheusageofthesesystemsandsuggestappropriate

valuesforthereliabilitymetrics.

asystemthatmonitorspatientsinahospitalintensivecareunit

awordprocessor

anautomatedvendingmachinecontrolsystem

asystemtocontrolbrakinginacar

asystemtocontrolarefrigerationunit

amanagementreportgenerator

11.3.

Imaginethatanetworkoperationscentermonitorsandcontrolsthenationaltelecommu

nicationsnetworkofacountry.Thisincludescontrollingandmonitoringtheoperationa

statusofswitchingandtransmissionequipmentandkeepingtrackofnationwideequip-

mentinventories.Thecenterneedstohaveredundantsystems.Explainthreereliability

metricsyouwouldusetospecifytheneedsofsuchsystems.

11.4.

Whatisthecommoncharacteristicofallarchitecturalstylesthataregearedtosupporting

softwarefaulttolerance?

11.5. Suggestcircumstanceswhereitisappropriatetouseafault-

tolerantarchitecturewhen implementingasoftware-

basedcontrolsystemandexplainwhythisapproachisrequired.

11.6.

Youareresponsibleforthedesignofacommunicationsswitchthathastoprovide24/7

availabilitybutthatisnotsafety-

critical.Givingreasonsforyouranswer,suggestanarchi-

tecturalstylethatmightbeusedforthissystem.

11.7.

Ithasbeensuggestedthatthecontrolsoftwareforaradiationtherapymachine,usedto

treatpatientswithcancer,shouldbeimplementedusing N-

versionprogramming.Comment

onwhetherornotyouthinkthisisagoodsuggestion.

11.8.

Explainwhyalltheversionsinasystemdesignedaroundsoftwarediversitymayfailina

similarway.

11.9.

Explainhowprogramminglanguagesupportofexceptionhandlingcancontributetothe

abilityofsoftwaresystems.

11.10.

Softwarefailurescancauseconsiderableinconveniencetousersofthesoftware.Isit

ethicalforcompaniestoreleasesoftwarethattheyknowincludesfaultsthatcouldlead

tosoftwarefailures?

Shouldtheybeliableforcompensatingusersforlossesthatare

causedbythefailureoftheirsoftware?

Shouldtheyberequiredbylawtooffersoftware

warrantiesinthesamewaythatconsumergoodsmanufacturersmustguarantee

theirproducts?

338

338 Chapter

Chapter 11 Reliability

Reliability engineering

R e F e R e n C e s

Avizienis, A. A. 1995. “A Methodology of N-Version Programming.” In

Software Fault Tolerance, edited by M. R. Lyu, 23–46. Chichester, UK: John

Wiley & Sons.

Brilliant, S. S., J. C. Knight, and N. G. Leveson. 1990. “Analysis of Faults in

an N-Version Software Experiment.” IEEE Trans. On Software Engineering

16 (2): 238–247. doi:10.1109/32.44387.

Hatton, L. 1997. “N-Version Design Versus One Good Version.” IEEE

Software 14 (6): 71–76.

doi:10.1109/52.636672.

Leveson, N. G. 1995. Safeware: System Safety and Computers. Reading, MA:

Addison-Wesley.

Musa, J. D. 1998. Software Reliability Engineering: More Reliable Software,

Faster Development and Testing. New York: McGraw-Hill.

Prowell, S. J., C. J. Trammell, R. C. Linger, and J. H. Poore. 1999.

Cleanroom Software Engineering: Technology and Process. Reading, MA:

Addison-Wesley.

Pullum, L. 2001. Software Fault Tolerance Techniques and Implementation.

Norwood, MA: Artech House.

Randell, B. 2000. “Facing Up To Faults.” Computer J. 45 (2): 95–106.

doi:10.1093/comjnl/43.2.95.

Torres-Pomales, W. 2000. “Software Fault Tolerance: A Tutorial.” NASA.

http://ntrs.nasa.gov/

archive/nasa/casi. . ./20000120144_2000175863.pdf

Voas, J., and G. McGraw. 1997. Software Fault Injection: Innoculating

Programs Against Errors. New York: John Wiley & Sons.

12

Safety engineering

Objectives

The objective of this chapter is to explain techniques that are used to

ensure safety when developing critical systems. When you have read this

chapter, you will:

understand what is meant by a safety-critical system and why safety

has to be considered separately from reliability in critical systems

engineering;

understand how an analysis of hazards can be used to derive safety

requirements;

know about processes and tools that are used for software safety

assurance;

understand the notion of a safety case that is used to justify the safety

of a system to regulators, and how formal arguments may be used in

safety cases.

Contents

12.1 Safety-critical systems

12.2 Safety requirements

12.3 Safety engineering processes

12.4 Safety cases

340 Chapter 12 Safety engineering

In Section 11.2, I briefly described an air accident at Warsaw Airport

where an

Airbus crashed on landing. Two people were killed and 54 were injured.

The subse-

quent inquiry showed that a major contributory cause of the accident was

a failure of the control software that reduced the efficiency of the aircraft’s

braking system. This is one of the, thankfully rare, examples of where the

behavior of a software system

has led to death or injury. It illustrates that software is now a central

component in many systems that are critical to preserving and

maintaining life. These are safety-critical software systems, and a range of

specialized methods and techniques have

been developed for safety-critical software engineering.

As I discussed in Chapter 10, safety is one of the principal dependability

proper-

ties. A system can be considered to be safe if it operates without

catastrophic failure, that is, failure that causes or may cause death or

injury to people. Systems whose

failure may lead to environmental damage may also be safety-critical as

environmen-

tal damage (such as a chemical leak) can lead to subsequent human injury

or death.

Software in safety-critical systems has a dual role to play in achieving

safety:

1. The system may be software-controlled so that the decisions made by

the soft-

ware and subsequent actions are safety-critical. Therefore, the software

behav-

ior is directly related to the overall safety of the system.

2. Software is extensively used for checking and monitoring other safety-

critical components in a system. For example, all aircraft engine

components are monitored by

software looking for early indications of component failure. This software

is safety-critical because, if it fails, other components may fail and cause

an accident.

Safety in software systems is achieved by developing an understanding of

the situ-

ations that might lead to safety-related failures. The software is engineered

so that such failures do not occur. You might therefore think that if a

safety-critical system is reliable and behaves as specified, it will therefore

be safe. Unfortunately, it isn’t quite as simple as that. System reliability is

necessary for safety achievement, but it isn’t enough. Reliable systems can

be unsafe and vice versa. The Warsaw Airport accident

was an example of such a situation, which I’ll discuss in more detail in

Section 12.2.

Software systems that are reliable may not be safe for four reasons:

1. We can never be 100% certain that a software system is fault-free and

fault-tolerant. Undetected faults can be dormant for a long time, and

software

failures can occur after many years of reliable operation.

2. The specification may be incomplete in that it does not describe the

required

behavior of the system in some critical situations. A high percentage of

system

malfunctions are the result of specification rather than design errors. In a

study

of errors in embedded systems, Lutz (Lutz 1993) concludes that

“difficulties

with requirements are the key root cause of the safety-related software

errors,

which have persisted until integration and system testing.†”

†Lutz, R R. 1993. “Analysing Software Requirements Errors in Safety-

Critical Embedded Systems.” In RE’93, 126–133. San Diego CA: IEEE.

doi:0.1109/ISRE.1993.324825.

12.1 Safety-critical systems 341

More recent work by Veras et al. (Veras et al. 2010) in space systems

confirms

that requirements errors are still a major problem for embedded systems.

3. Hardware malfunctions may cause sensors and actuators to behave in

an unpre-

dictable way. When components are close to physical failure, they may

behave

erratically and generate signals that are outside the ranges that can be

handled by

the software. The software may then either fail or wrongly interpret these

signals.

4. The system operators may generate inputs that are not individually

incorrect but that, in some situations, can lead to a system malfunction.

An anecdotal example of this occurred when an aircraft undercarriage

collapsed while the aircraft

was on the ground. Apparently, a technician pressed a button that

instructed the

utility management software to raise the undercarriage. The software

carried out

the mechanic’s instruction perfectly. However, the system should have

disal-

lowed the command unless the plane was in the air.

Therefore, safety has to be considered as well as reliability when

developing

safety-critical systems. The reliability engineering techniques that I

introduced in Chapter 11 are obviously applicable for safety-critical

systems engineering. I therefore do not discuss system architectures and

dependable programming here but

instead focus on techniques for improving and assuring system safety.

12.1 Safety-critical systems

Safety-critical systems are systems in which it is essential that system

operation is always safe. That is, the system should never damage people

or the system’s environment, irrespective of whether or not the system

conforms to its specification. Examples of safety-critical systems include

control and monitoring systems in aircraft, process control systems in

chemical and pharmaceutical plants, and automobile control systems.

Safety-critical software falls into two classes:

1. Primary safety-critical software This is software that is embedded as a

controller in a system. Malfunctioning of such software can cause a

hardware malfunc-

tion, which results in human injury or environmental damage. The insulin

pump

software that I introduced in Chapter 1 is an example of a primary safety-

critical

system. System failure may lead to user injury.

The insulin pump system is a simple system, but software control is also

used in

very complex safety-critical systems. Software rather than hardware

control is

essential because of the need to manage large numbers of sensors and

actuators,

which have complex control laws. For example, advanced,

aerodynamically

unstable, military aircraft require continual software-controlled

adjustment of

their flight surfaces to ensure that they do not crash.

2. Secondary safety-critical software This is software that can indirectly

result in an injury. An example of such software is a computer-aided

engineering design system

342 Chapter 12 Safety engineering

whose malfunctioning might result in a design fault in the object being

designed.

This fault may cause injury to people if the designed system malfunctions.

Another

example of a secondary safety-critical system is the Mentcare system for

mental

health patient management. Failure of this system, whereby an unstable

patient may

not be treated properly, could lead to that patient injuring himself or

others.

Some control systems, such as those controlling critical national

infrastructure (electricity supply, telecommunications, sewage treatment,

etc.), are secondary safety-

critical systems. Failure of these systems is unlikely to have immediate

human

consequences. However, a prolonged outage of the controlled systems

could lead to

injury and death. For example, failure of a sewage treatment system could

lead to a

higher level of infectious disease as raw sewage is released into the

environment.

I explained in Chapter 11 how software and system availability and

reliability are

achieved through fault avoidance, fault detection and removal, and fault

tolerance.

Safety-critical systems development uses these approaches and augments

them with

hazard-driven techniques that consider the potential system accidents that

may occur: 1. Hazard avoidance The system is designed so that hazards are

avoided. For example, a paper-cutting system that requires an operator to

use two hands to

press separate buttons simultaneously avoids the hazard of the operator’s

hands

being in the blade’s pathway.

2. Hazard detection and removal The system is designed so that hazards are

detected and removed before they result in an accident. For example, a

chemical

plant system may detect excessive pressure and open a relief valve to

reduce

pressure before an explosion occurs.

3. Damage limitation The system may include protection features that

minimize the damage that may result from an accident. For example, an

aircraft engine

normally includes automatic fire extinguishers. If there is an engine fire, it

can

often be controlled before it poses a threat to the aircraft.

A hazard is a system state that could lead to an accident. Using the above

example

of the paper-cutting system, a hazard arises when the operator’s hand is in

a position where the cutting blade could injure it. Hazards are not

accidents—we often get our-selves into hazardous situations and get out of

them without any problems. However,

accidents are always preceded by hazards, so reducing hazards reduces

accidents.

A hazard is one example of the specialized vocabulary that is used in

safety-critical systems engineering. I explain other terminology used in

safety-critical systems in Figure 12.1.

We are now actually pretty good at building systems that can cope with

one thing

going wrong. We can design mechanisms into the system that can detect

and recover

from single problems. However, when several things go wrong at the same

time, acci-

dents are more likely. As systems become more and more complex, we

don’t understand

the relationships between the different parts of the system. Consequently,

we cannot predict the consequences of a combination of unexpected

system events or failures.

In an analysis of serious accidents, Perrow (Perrow 1984) suggested that

almost

all of the accidents were due to a combination of failures in different parts

of a system.

12.1 Safety-critical systems 343

Term

Definition

Accident (or mishap)

An unplanned event or sequence of events that results in human death or

injury,

damage to property or to the environment. An overdose of insulin is an

example of

an accident.

Damage

A measure of the loss resulting from a mishap. Damage can range from

many

people being killed as a result of an accident to minor injury or property

damage.

Damage resulting from an overdose of insulin could lead to serious injury

or the

death of the user of the insulin pump.

Hazard

A condition with the potential for causing or contributing to an accident.

A failure of the sensor that measures blood glucose is an example of a

hazard.

Hazard probability

The probability of the events occurring which create a hazard. Probability

values

tend to be arbitrary but range from “probable” (say 1/100 chance of a

hazard

occurring) to “implausible” (no conceivable situations are likely in which

the hazard could occur). The probability of a sensor failure in the insulin

pump that

overestimates the user’s blood sugar level is low.

Hazard severity

An assessment of the worst possible damage that could result from a

particular

hazard. Hazard severity can range from catastrophic, where many people

are killed,

to minor, where only minor damage results. When an individual death is a

possibility, a reasonable assessment of hazard severity is “very high.”

Risk

A measure of the probability that the system will cause an accident. The

risk is assessed by considering the hazard probability, the hazard severity,

and the probability that the hazard will lead to an accident. The risk of an

insulin overdose is medium to low.

Figure 12.1 Safety

terminology

Unanticipated combinations of subsystem failures led to interactions that

resulted in overall system failure. For example, failure of an air

conditioning system may lead

to overheating. Once hardware gets hot, its behavior becomes

unpredictable, so

overheating may lead to the system hardware generating incorrect signals.

These

wrong signals may then cause the software to react incorrectly.

Perrow made the point that, in complex systems, it is impossible to

anticipate all

possible combinations of failures. He therefore coined the phrase “normal

acci-

dents,” with the implication that accidents have to be considered as

inevitable when we build complex safety-critical systems.

To reduce complexity, we could use simple hardware controllers rather

than soft-

ware control. However, software-controlled systems can monitor a wider

range of

conditions than simpler electromechanical systems. They can be adapted

relatively

easily. They use computer hardware, which has high inherent reliability

and which is physically small and lightweight.

Software-controlled systems can provide sophisticated safety interlocks.

They

can support control strategies that reduce the amount of time people need

to spend in hazardous environments. Although software control may

introduce more ways in

which a system can go wrong, it also allows better monitoring and

protection.

Therefore, software control can contribute to improvements in system

safety.

It is important to maintain a sense of proportion about safety-critical

systems. Critical software systems operate without problems most of the

time. Relatively few people

worldwide have been killed or injured because of faulty software. Perrow

is right in say-

344 Chapter 12 Safety engineering

Risk-based requirements specification

Risk-based specification is an approach that has been widely used by

safety and security-critical systems developers.

It focuses on those events that could cause the most damage or that are

likely to occur frequently. Events that have only minor consequences or

that are extremely rare may be ignored. The risk-based specification

process involves understanding the risks faced by the system, discovering

their root causes, and generating requirements to manage these risks.

http://software-engineering-book.com/web/risk-based-specification/

ing that accidents will always be a possibility. It is impossible to make a

system 100%

safe, and society has to decide whether or not the consequences of an

occasional

accident are worth the benefits that come from the use of advanced

technologies.

12.2 Safety requirements

In the introduction to this chapter, I described an air accident at Warsaw

Airport

where the braking system on an Airbus failed. The inquiry into this

accident showed

that the braking system software had operated according to its

specification. There

were no errors in the program. However, the software specification was

incomplete

and had not taken into account a rare situation, which arose in this case.

The soft-

ware worked, but the system failed.

This episode illustrates that system safety does not just depend on good

engineer-

ing. It requires attention to detail when the system requirements are

derived and the inclusion of special software requirements that are geared

to ensuring the safety of a system. Safety requirements are functional

requirements, which define checking and

recovery facilities that should be included in the system and features that

provide

protection against system failures and external attacks.

The starting point for generating functional safety requirements is usually

domain

knowledge, safety standards, and regulations. These lead to high-level

requirements

that are perhaps best described as “shall not” requirements. By contrast

with normal functional requirements that define what the system shall do,

“shall not” requirements define system behavior that is unacceptable.

Examples of “shall not” requirements are:

“The system shall not allow reverse thrust mode to be selected when the

aircraft

is in flight.”

“The system shall not allow the simultaneous activation of more than

three alarm

signals.”

“The navigation system shall not allow users to set the required

destination when

the car is moving.”

These “shall not” requirements cannot be implemented directly but have

to be

decomposed into more specific software functional requirements.

Alternatively,

they may be implemented through system design decisions such as a

decision to use

particular types of equipment in the system.

12.2 Safety requirements 345

Hazard

Hazard

Hazard

Risk reduction

identification

assessment

analysis

Hazard

Figure 12.2 Hazard-

Hazard register

probability and

Root cause

Safety

driven requirements

acceptability

analyses

requirements

specification

Safety requirements are primarily protection requirements and are not

concerned

with normal system operation. They may specify that the system should be

shut down

so that safety is maintained. In deriving safety requirements, you therefore

need to find an acceptable balance between safety and functionality and

avoid overprotection. There is no point in building a very safe system if it

does not operate in a cost-effective way.

Risk-based requirements specification is a general approach used in

critical systems engineering where risks faced by the system are identified

and requirements to avoid or mitigate these risks are identified. It may be

used for all types of dependability requirements. For safety-critical

systems, it translates into a process driven by identified hazards. As I

discussed in the previous section, a hazard is something that could (but

need not) result in death or injury to a person.

There are four activities in a hazard-driven safety specification process:

1. Hazard identification The hazard identification process identifies hazards

that may threaten the system. These hazards may be recorded in a hazard

register.

This is a formal document that records the safety analyses and assessments

and

that may be submitted to a regulator as part of a safety case.

2. Hazard assessment The hazard assessment process decides which hazards

are the most dangerous and/or the most likely to occur. These should be

prioritized

when deriving safety requirements.

3. Hazard analysis This is a process of root-cause analysis that identifies

the events that can lead to the occurrence of a hazard.

4. Risk reduction This process is based on the outcome of hazard analysis

and leads to identification of safety requirements. These requirements may

be concerned with ensuring that a hazard does not arise or lead to an

accident or that if

an accident does occur, the associated damage is minimized.

Figure 12.2 illustrates this hazard-driven safety requirements specification

process.

12.2.1 Hazard identification

In safety-critical systems, hazard identification starts by identifying

different classes of hazards, such as physical, electrical, biological,

radiation, and service failure hazards.

Each of these classes can then be analyzed to discover specific hazards

that could occur.

Possible combinations of hazards that are potentially dangerous must also

be identified.

346 Chapter 12 Safety engineering

Experienced engineers, working with domain experts and professional

safety

advisers, identify hazards from previous experience and from an analysis

of the application domain. Group working techniques such as

brainstorming may be used, where

a group meets to exchange ideas. For the insulin pump system, people

who may be

involved include doctors, medical physicists and engineers, and software

designers.

The insulin pump system that I introduced in Chapter 1 is a safety-critical

system,

because failure can cause injury or even death to the system user.

Accidents that may occur when using this machine include the user

suffering from long-term consequences of poor blood sugar control (eye,

heart, and kidney problems), cognitive

dysfunction as a result of low blood sugar levels, or the occurrence of

some other

medical conditions, such as an allergic reaction.

Some of the hazards that may arise in the insulin pump system are:

insulin overdose computation (service failure);

insulin underdose computation (service failure);

failure of the hardware monitoring system (service failure);

power failure due to exhausted battery (electrical);

electrical interference with other medical equipment such as a heart

pacemaker

(electrical);

poor sensor and actuator contact caused by incorrect fitting (physical);

parts of machine breaking off in patient’s body (physical);

infection caused by introduction of machine (biological); and

allergic reaction to the materials or insulin used in the machine

(biological).

Software-related hazards are normally concerned with failure to deliver a

system

service or with the failure of monitoring and protection systems.

Monitoring and

protection systems may be included in a device to detect conditions, such

as a low

battery level, which could lead to device failure.

A hazard register may be used to record the identified hazards with an

explanation of why the hazard has been included. The hazard register is

an important legal document that records all safety-related decisions

about each hazard. It can be used to show that the requirements engineers

have paid due care and attention in considering all foreseeable hazards

and that these hazards have been analyzed. In the event of an accident,

the hazard register may be used in a subsequent inquiry or legal

proceedings to show that the system developers have not been negligent in

their system safety analysis.

12.2.2 Hazard assessment

The hazard assessment process focuses on understanding the factors that

lead to the

occurrence of a hazard and the consequences if an accident or incident

associated

with that hazard should occur. You need to carry out this analysis to

understand

12.2 Safety requirements 347

Unacceptable region

Risk cannot be tolerated

Risk tolerated only if

ALARP

risk reduction is impractical

region

or excessively expensive

Acceptable

region

Figure 12.3 The risk

triangle

Negligible risk

whether a hazard is a serious threat to the system or environment. The

analysis also provides a basis for deciding on how to manage the risk

associated with the hazard.

For each hazard, the outcome of the analysis and classification process is a

state-

ment of acceptability. This is expressed in terms of risk, where the risk

takes into account the likelihood of an accident and its consequences.

There are three risk categories that are used in hazard assessment:

1. Intolerable risks in safety-critical systems are those that threaten human

life.

The system must be designed so that such hazards either cannot arise or,

that if

they do, features in the system will ensure that they are detected before

they

cause an accident. In the case of the insulin pump, an intolerable risk is

that an

overdose of insulin should be delivered.

2. As low as reasonably practical (ALARP) risks are those that have less

serious consequences or that are serious but have a very low probability of

occurrence.

The system should be designed so that the probability of an accident

arising

because of a hazard is minimized, subject to other considerations such as

cost and

delivery. An ALARP risk for an insulin pump might be the failure of the

hardware

monitoring system. The consequences of this failure are, at worst, a short-

term

insulin underdose. This situation would not lead to a serious accident.

3. Acceptable risks are those where the associated accidents normally result

in minor damage. System designers should take all possible steps to reduce

“acceptable” risks, as long as these measures do not significantly increase

costs,

delivery time, or other non-functional system attributes. An acceptable

risk in

the case of the insulin pump might be the risk of an allergic reaction

arising in

the user. This reaction usually causes only minor skin irritation. It would

not be

worth using special, more expensive materials in the device to reduce this

risk.

Figure 12.3 shows these three regions. The width of the triangle reflects

the

costs of ensuring that risks do not result in incidents or accidents. The

highest

348 Chapter 12 Safety engineering

Hazard

Accident

Estimated

Identified hazard

probability

severity

risk

Acceptability

1. Insulin overdose computation

Medium

High

High

Intolerable

2. Insulin underdose

Medium

Low

Low

Acceptable

computation

3. Failure of hardware

Medium

Medium

Low

ALARP

monitoring system

4. Power failure

High

Low

Low

Acceptable

5. Machine incorrectly fitted

High

High

High

Intolerable

6. Machine breaks in patient

Low

High

Medium

ALARP

7. Machine causes infection

Medium

Medium

Medium

ALARP

8. Electrical interference

Low

High

Medium

ALARP

9. Allergic reaction

Low

Low

Low

Acceptable

Figure 12.4 Risk

costs are incurred by risks at the top of the diagram, the lowest costs by

risks at the classification for the

apex of the triangle.

insulin pump

The boundaries between the regions in Figure 12.3 are not fixed but

depend on

how acceptable risks are in the societies where the system will be

deployed. This

varies from country to country—some societies are more risk averse and

litigious

than others. Over time, however, all societies have become more risk-

averse, so the

boundaries have moved downward. For rare events, the financial costs of

accepting

risks and paying for any resulting accidents may be less than the costs of

accident

prevention. However, public opinion may demand that money be spent to

reduce the

likelihood of a system accident irrespective of cost.

For example, it may be cheaper for a company to clean up pollution on

the rare occa-

sion it occurs, rather than to install systems for pollution prevention.

However, because the public and the media will not tolerate such

accidents, clearing up the damage rather than preventing the accident is

no longer acceptable. Events in other systems may also lead to a

reclassification of risk. For example, risks that were thought to be

improbable (and hence in the ALARP region) may be reclassified as

intolerable because of external events, such as terrorist attacks, or natural

phenomena, such as tsunamis.

Figure 12.4 shows a risk classification for the hazards identified in the

previous

section for the insulin delivery system. I have separated the hazards that

relate to the incorrect computation of insulin into an insulin overdose and

an insulin underdose.

An insulin overdose is potentially more serious than an insulin underdose

in the

short term. Insulin overdose can result in cognitive dysfunction, coma, and

ulti-

mately death. Insulin underdoses lead to high levels of blood sugar. In the

short

term, these high levels cause tiredness but are not very serious; in the

longer term, however, they can lead to serious heart, kidney, and eye

problems.

Hazards 4–9 in Figure 12.4 are not software related, but software

nevertheless has a role to play in hazard detection. The hardware

monitoring software should monitor the system state and warn of

potential problems. The warning will often allow the hazard to

12.2 Safety requirements 349

be detected before it causes an accident. Examples of hazards that might

be detected are power failure, which is detected by monitoring the

battery, and incorrect fitting of the machine, which may be detected by

monitoring signals from the blood sugar sensor.

The monitoring software in the system is, of course, safety-related. Failure

to detect a hazard could result in an accident. If the monitoring system

fails but the hardware is working correctly, then this is not a serious

failure. However, if the monitoring system fails and hardware failure

cannot then be detected, then this could have more serious consequences.

Hazard assessment involves estimating the hazard probability and risk

severity.

This is difficult as hazards and accidents are uncommon. Consequently,

the engineers involved may not have direct experience of previous

incidents or accidents. In estimating probabilities and accident severity, it

makes sense to use relative terms such as probable, unlikely, rare, high,

medium, and low. Quantifying these terms is practically impossible because

not enough statistical data is available for most types of accident.

12.2.3 Hazard analysis

Hazard analysis is the process of discovering the root causes of hazards in

a safety-critical system. Your aim is to find out what events or

combination of events could cause a system failure that results in a

hazard. To do this, you can use either a top-down or a bottom-up

approach. Deductive, top-down techniques, which are easier to use, start

with the hazard and work from that to the possible system failure.

Inductive, bottom-up techniques start with a proposed system failure and

identify what hazards might result from that failure.

Various techniques have been proposed as possible approaches to hazard

decom-

position or analysis (Storey 1996). One of the most commonly used

techniques is

fault tree analysis, a top-down technique that was developed for the

analysis of both hardware and software hazards (Leveson, Cha, and

Shimeall 1991). This technique

is fairly easy to understand without specialist domain knowledge.

To do a fault tree analysis, you start with the hazards that have been

identified.

For each hazard, you then work backwards to discover the possible causes

of that

hazard. You put the hazard at the root of the tree and identify the system

states that can lead to that hazard. For each of these states, you then

identify further system

states that can lead to them. You continue this decomposition until you

reach the root cause(s) of the risk. Hazards that can only arise from a

combination of root causes

are usually less likely to lead to an accident than hazards with a single

root cause.

Figure 12.5 is a fault tree for the software-related hazards in the insulin

delivery system that could lead to an incorrect dose of insulin being

delivered. In this case, I have merged insulin underdose and insulin

overdose into a single hazard, namely, “incorrect insulin dose

administered.” This reduces the number of fault trees that are required. Of

course, when you specify how the software should react to this hazard,

you have to

distinguish between an insulin underdose and an insulin overdose. As I

have said, they are not equally serious—in the short term, an overdose is

the more serious hazard.

From Figure 12.5, you can see that:

1. Three conditions could lead to the administration of an incorrect dose

of insulin.

(1) The level of blood sugar may have been incorrectly measured, so the

insulin

requirement has been computed with an incorrect input. (2) The delivery

system

350 Chapter 12 Safety engineering

Incorrect

insulin dose

administered

or

Incorrect

Correct dose

Delivery

sugar level

delivered at

system

measured

wrong time

failure

or

or

Sensor

Sugar

Timer

Insulin

Pump

failure

computation

failure

computation

signals

error

incorrect

incorrect

or

or

Figure 12.5 An

Algorithm

Arithmetic

Algorithm

Arithmetic

example of a

error

error

error

error

fault tree

may not respond correctly to commands specifying the amount of insulin

to be

injected. Alternatively, (3) the dose may be correctly computed, but it is

deliv-

ered too early or too late.

2. The left branch of the fault tree, concerned with incorrect measurement

of the

blood sugar level, identifies how this might happen. This could occur

either

because the sensor that provides an input to calculate the sugar level has

failed or because the calculation of the blood sugar level has been carried

out incorrectly.

The sugar level is calculated from some measured parameter, such as the

conduc-

tivity of the skin. Incorrect computation can result from either an incorrect

algo-

rithm or an arithmetic error that results from the use of floating-point

numbers.

3. The central branch of the tree is concerned with timing problems and

concludes

that these can only result from system timer failure.

12.2 Safety requirements 351

4. The right branch of the tree, concerned with delivery system failure,

examines

possible causes of this failure. These could result from an incorrect

computation

of the insulin requirement or from a failure to send the correct signals to

the

pump that delivers the insulin. Again, an incorrect computation can result

from

algorithm failure or arithmetic errors.

Fault trees are also used to identify potential hardware problems.

Hardware fault

trees may provide insights into requirements for software to detect and,

perhaps, correct these problems. For example, insulin doses are not

administered frequently—no

more than five or six times per hour and sometimes less often than that.

Therefore,

processor capacity is available to run diagnostic and self-checking

programs.

Hardware errors such as sensor, pump, or timer errors can be discovered

and warn-

ings issued before they have a serious effect on the patient.

12.2.4 Risk reduction

Once potential risks and their root causes have been identified, you are

then able to derive safety requirements that manage the risks and ensure

that incidents or accidents do not occur. You can use three possible

strategies:

1. Hazard avoidance, where a system is designed so that the hazard cannot

occur.

2. Hazard detection and removal, where a system is designed so that

hazards are detected and neutralized before they result in an accident.

3. Damage limitation, where a system is designed so that the consequences

of an accident are minimized.

Normally, designers of critical systems use a combination of these

approaches. In a

safety-critical system, intolerable hazards may be handled by minimizing

their probability and adding a protection system (see Chapter 11) that

provides a safety backup. For example, in a chemical plant control system,

the system will attempt to detect and avoid excess pressure in the reactor.

However, there may also be an independent protection system that

monitors the pressure and opens a relief valve if high pressure is detected.

In the insulin delivery system, a safe state is a shutdown state where no

insulin is injected. Over a short period, this is not a threat to the diabetic’s

health. For the software failures that could lead to an incorrect dose of

insulin, the following “solu-

tions” might be developed:

1. Arithmetic error This error may occur when an arithmetic computation

causes a representation failure. The specification should identify all

possible arithmetic

errors that may occur and state that an exception handler must be

included for

each possible error. The specification should set out the action to be taken

for

each of these errors. The default safe action is to shut down the delivery

system

and activate a warning alarm.

2. Algorithmic error This is a more difficult situation as there is no clear

program exception that must be handled. This type of error could be

detected by comparing

352 Chapter 12 Safety engineering

SR1: The system shall not deliver a single dose of insulin that is greater

than a specified maximum dose for a system user.

SR2: The system shall not deliver a daily cumulative dose of insulin that is

greater than a specified maximum daily dose for a system user.

SR3: The system shall include a hardware diagnostic facility that shall be

executed at least four times per hour.

SR4: The system shall include an exception handler for all of the

exceptions that are identified in Table 3.

SR5: The audible alarm shall be sounded when any hardware or software

anomaly is discovered and a diagnostic message as defined in Table 4 shall

be displayed.

SR6: In the event of an alarm, insulin delivery shall be suspended until the

user has reset the system and cleared the alarm.

Figure 12.6

Note: Tables 3 and 4 relate to tables that are included in the requirements

document; Examples of safety

they are not shown here.

requirements

the required insulin dose computed with the previously delivered dose. If

it is much higher, this may mean that the amount has been computed

incorrectly. The system

may also keep track of the dose sequence. After a number of above-

average doses

have been delivered, a warning may be issued and further dosage limited.

Some of the resulting safety requirements for the insulin pump software

are shown

in Figure 12.6. The requirements in Figure 12.6 are user requirements.

Naturally, they would be expressed in more detail in a more detailed

system requirements specification.

12.3 Safety engineering processes

The software processes used to develop safety-critical software are based

on the

processes used in software reliability engineering. In general, a great deal

of care is taken in developing a complete, and often very detailed, system

specification. The

design and implementation of the system usual follow a plan-based,

waterfall model,

with reviews and checks at each stage in the process. Fault avoidance and

fault

detection are the drivers of the process. For some types of system, such as

aircraft systems, fault-tolerant architectures, as I discussed in Chapter 11,

may be used.

Reliability is a prerequisite for safety-critical systems. Because of the very

high

costs and potentially tragic consequences of system failure, additional

verification activities may be used in safety-critical systems development.

These activities may

include developing formal models of a system, analyzing them to discover

errors

and inconsistencies, and using static analysis software tools that parse the

software source code to discover potential faults.

Safe systems have to be reliable, but, as I have discussed, reliability is not

enough.

Requirements and verification errors and omissions may mean that

reliable systems

are unsafe. Therefore, safety-critical systems development processes

should include

12.3 Safety engineering processes 353

safety reviews, where engineers and system stakeholders examine the

work done

and explicitly look for potential issues that could affect the safety of the

system.

Some types of safety-critical systems are regulated, as I explained in

Chapter 10.

National and international regulators require detailed evidence that the

system is

safe. This evidence might include:

1. The specification of the system that has been developed and records of

the

checks made on that specification.

2. Evidence of the verification and validation processes that have been

carried out and the results of the system verification and validation.

3. Evidence that the organizations developing the system have defined and

depend-

able software processes that include safety assurance reviews. There must

also

be records showing that these processes have been properly enacted.

Not all safety-critical systems are regulated. For example, there is no

regulator for automobiles, although cars now have many embedded

computer systems. The safety

of car-based systems is the responsibility of the car manufacturer.

However, because of the possibility of legal action in the event of an

accident, developers of unregulated systems have to maintain the same

detailed safety information. If a case is

brought against them, they have to be able to show that they have not

been negligent in the development of the car’s software.

The need for this extensive process and product documentation is another

reason

why agile processes cannot be used, without significant change, for safety-

critical

systems development. Agile processes focus on the software itself and

(rightly)

argue that a great deal of process documentation is never actually used

after it has been produced. However, where you have to keep records for

legal or regulatory

reasons, you must maintain documentation about both the processes used

and the

system itself.

Safety-critical systems, like other types of system that have high

dependability

requirements, need to be based on dependable processes (see Chapter 10).

A

dependable process will normally include activities such as requirements

man-

agement, change management and configuration control, system

modeling,

reviews and inspections, test planning, and test coverage analysis. When a

system

is safety-critical, there may be additional safety assurance and verification

and

analyses processes.

12.3.1 Safety assurance processes

Safety assurance is a set of activities that check that a system will operate

safely. Specific safety assurance activities should be included at all stages

in the software development process. These activities record the safety

analyses that have been carried out and the person or persons responsible

for these analyses. Safety assurance activities have to be thoroughly

documented. This documentation may be part of the evidence that is used

to convince a regulator or system owner that a system will operate safely.

354 Chapter 12 Safety engineering

Examples of safety assurance activities are:

1. Hazard analysis and monitoring, where hazards are traced from

preliminary hazard analysis through to testing and system validation.

2. Safety reviews, which are used throughout the development process.

3. Safety certification, where the safety of critical components is formally

certified. This involves a group external to the system development team

examining

the available evidence and deciding whether or not a system or

component

should be considered to be safe before it is made available for use.

To support these safety assurance processes, project safety engineers

should be

appointed who have explicit responsibility for the safety aspects of a

system. These individuals will be accountable if a safety-related system

failure occurs. They must be able to demonstrate that the safety assurance

activities have been properly carried out.

Safety engineers work with quality managers to ensure that a detailed

configura-

tion management system is used to track all safety-related documentation

and keep it in step with the associated technical documentation. There is

little point in having stringent validation procedures if a failure of

configuration management means that

the wrong system is delivered to the customer. Quality and configuration

manage-

ment are covered in Chapters 24 and 25.

Hazard analysis is an essential part of safety-critical systems development.

It

involves identifying hazards, their probability of occurrence, and the

probability of a hazard leading to an accident. If there is program code

that checks for and handles

each hazard, then you can argue that these hazards will not result in

accidents. Where external certification is required before a system is used

(e.g., in an aircraft), it is usually a condition of certification that this

traceability can be demonstrated.

The central safety document that should be produced is the hazard

register. This document provides evidence of how identified hazards have

been taken into account during software development. This hazard

register is used at each stage of the software development process to

document how that development stage has taken the hazards into account.

A simplified example of a hazard register entry for the insulin delivery

system is

shown in Figure 12.7. This register documents the process of hazard

analysis and

shows design requirements that have been generated during this process.

These

design requirements are intended to ensure that the control system can

never deliver an insulin overdose to a user of the insulin pump.

Individuals who have safety responsibilities should be explicitly identified

in the

hazard register. Personal identification is important for two reasons:

1. When people are identified, they can be held accountable for their

actions. They are likely to take more care because any problems can be

traced back to their work.

2. In the event of an accident, there may be legal proceedings or an

inquiry. It is important to be able to identify those responsible for safety

assurance so that

they can defend their actions as part of the legal process.

12.3 Safety engineering processes 355

Hazard Register.

Page 4: Printed 20.02.2012

System: Insulin Pump System

File: InsulinPump/Safety/HazardLog

Safety Engineer: James Brown

Log version: 1/3

Identified Hazard

Insulin overdose delivered to patient

Identified by

Jane Williams

Criticality class 1

Identified risk High

Fault tree identified YES Date 24.01.11 Location Hazard register, Page 5

Fault tree creators

Jane Williams and Bill Smith

Fault tree checked YES Date 28.01.11 Checker

James Brown

System safety design requirements

1. The system shall include self-testing software that will test the sensor

system, the clock, and the insulin delivery system.

2. The self-checking software shall be executed once per minute.

3. In the event of the self-checking software discovering a fault in any of

the system components, an audible warning shall be issued and the pump

display shall indicate the name of the component where the fault has been

discovered. The deliv-

ery of insulin shall be suspended.

4. The system shall incorporate an override system that allows the system

user to

modify the computed dose of insulin that is to be delivered by the system.

5. The amount of override shall be no greater than a pre-set value

(maxOverride),

Figure 12.7

which is set when the system is configured by medical staff.

A simplified hazard

register entry

Safety reviews are reviews of the software specification, design, and

source code

whose aim is to discover potentially hazardous conditions. These are not

automated

processes but involve people carefully checking for errors that have been

made and

for assumptions or omissions that may affect the safety of a system. For

example, in the aircraft accident that I introduced earlier, a safety review

might have questioned the assumption that an aircraft is on the ground

when there is weight on both wheels and the wheels are rotating.

Safety reviews should be driven by the hazard register. For each of the

identified

hazards, a review team examines the system and judges whether or not it

would cope

with that hazard in a safe way. Any doubts raised are flagged in the

review team’s

report and have to be addressed by the system development team. I

discuss reviews of different types in more detail in Chapter 24, which

covers software quality assurance.

Software safety certification is used when external components are

incorporated

into a safety-critical system. When all parts of a system have been locally

developed, complete information about the development processes used

can be maintained.

However, it is not cost-effective to develop components that are readily

available

from other vendors. The problem for safety-critical systems development

is that

these external components may have been developed to different

standards than

locally developed components. Their safety is unknown.

Consequently, it may be a requirement that all external components must

be certified before they can be integrated with a system. The safety

certification team, which is separate from the development team, carries

out extensive verification and validation of

356 Chapter 12 Safety engineering

Licensing of software engineers

In some areas of engineering, safety engineers must be licensed engineers.

Inexperienced, poorly qualified engineers are not allowed to take

responsibility for safety. In 30 states of the United States, there is some

form of licensing for software engineers involved in safety-related systems

development. These states require that engineering involved in safety-

critical software development should be licensed engineers, with a defined

minimum level of qualifica-tions and experience. This is a controversial

issue, and licensing is not required in many other countries.

http://software-engineering-book.com/safety-licensing/

the components. If appropriate, they liaise with the component developers

to check that the developers have used dependable processes to create

these components and to

examine the component source code. Once the safety certification team is

satisfied that a component meets its specification and does not have

“hidden” functionality, they may issue a certificate allowing that

component to be used in safety-critical systems.

12.3.2 Formal verification

Formal methods of software development, as I discussed in Chapter 10,

rely on a

formal model of the system that serves as a system specification. These

formal

methods are mainly concerned with mathematically analyzing the

specification;

with transforming the specification to a more detailed, semantically

equivalent rep-

resentation; or with formally verifying that one representation of the

system is

semantically equivalent to another representation.

The need for assurance in safety-critical systems has been one of the

principal

drivers in the development of formal methods. Comprehensive system

testing is

extremely expensive and cannot be guaranteed to uncover all of the faults

in a sys-

tem. This is particularly true of systems that are distributed, so that system

components are running concurrently. Several safety-critical railway

systems were

developed using formal methods in the 1990s (Dehbonei and Mejia 1995;

Behm

et al. 1999). Companies such as Airbus routinely use formal methods in

their soft-

ware development for critical systems (Souyris et al. 2009).

Formal methods may be used at different stages in the V & V process:

1. A formal specification of the system may be developed and

mathematically ana-

lyzed for inconsistency. This technique is effective in discovering

specification

errors and omissions. Model checking, discussed in the next section, is a

par-

ticularly effective approach to specification analysis.

2. You can formally verify, using mathematical arguments, that the code

of a soft-

ware system is consistent with its specification. This requires a formal

specifi-

cation. It is effective in discovering programming and some design errors.

Because of the wide semantic gap between a formal system specification

and pro-

gram code, it is difficult and expensive to prove that a separately

developed program is

12.3 Safety engineering processes 357

consistent with its specification. Work on program verification is now

mostly based on transformational development. In a transformational

development process, a formal

specification is systematically transformed through a series of

representations to program code. Software tools support the development

of the transformations and help

verify that corresponding representations of the system are consistent. The

B method is probably the most widely used formal transformational

method (Abrial 2010). It has

been used for the development of train control systems and avionics

software.

Advocates of formal methods claim that the use of these methods leads to

more

reliable and safer systems. Formal verification demonstrates that the

developed pro-

gram meets its specification and that implementation errors will not

compromise the

dependability of the system. If you develop a formal model of concurrent

systems

using a specification written in a language such as CSP (Schneider 1999),

you can

discover conditions that might result in deadlock in the final program, and

you will be able to address these problems. This is very difficult to do by

testing alone.

However, formal specification and proof do not guarantee that the

software will

be safe in practical use:

1. The specification may not reflect the real requirements of users and

other system stakeholders. As I discussed in Chapter 10, system system

stakeholders rarely

understand formal notations, so they cannot directly read the formal

specification

to find errors and omissions. This means that there it is likely that the

formal

specification is not an accurate representation of the system requirements.

2. The proof may contain errors. Program proofs are large and complex,

so, like

large and complex programs, they usually contain errors.

3. The proof may make incorrect assumptions about the way that the

system is

used. If the system is not used as anticipated, then the system’s behavior

lies

outside the scope of the proof.

Verifying a nontrivial software system takes a great deal of time. It

requires math-

ematical expertise and specialized software tools, such as theorem provers.

It is an expensive process, and, as the system size increases, the costs of

formal verification increase disproportionately.

Many software engineers therefore think that formal verification is not

cost-effective. They believe that the same level of confidence in the system

can be

achieved more cheaply by using other validation techniques, such as

inspections and

system testing. However, companies such as Airbus that make use of

formal verifi-

cation claim that unit testing of components is not required, which leads

to significant cost savings (Moy et al. 2013).

I am convinced that that formal methods and formal verification have an

important role to play in the development of critical software systems.

Formal

specifications are very effective in discovering some types of specification

prob-

lems that may lead to system failure. Although formal verification remains

impractical for large systems, it can be used to verify critical safety and

security critical core components.

358 Chapter 12 Safety engineering

Extended finite-

Model

state model of

building

Requirements,

system

design or

Model

program

checker

Property

Desired system

specification

properties

Confirmation or

counter-

Figure 12.8 Model

examples

checking

12.3.3 Model checking

Formally verifying programs using a deductive approach is difficult and

expensive,

but alternative approaches to formal analysis have been developed that

are based on a more restricted notion of correctness. The most successful

of these approaches is called model checking (Jhala and Majumdar 2009).

Model checking involves creating a formal state model of a system and

checking the correctness of that model using specialized software tools.

The stages involved in model checking are shown in Figure 12.8.

Model checking has been widely used to check hardware systems designs.

It is

increasingly being used in critical software systems such as the control

software in NASA’s Mars exploration vehicles (Regan and Hamilton 2004;

Holzmann 2014)

and by Airbus in avionics software development (Bochot et al. 2009).

Many different model-checking tools have been developed. SPIN was an

early

example of a software model checker (Holzmann, 2003). More recent

systems

include SLAM from Microsoft (Ball, Levin, and Rajamani 2011) and PRISM

(Kwiatkowska, Norman, and Parker 2011).

The models used by model-checking systems are extended finite-state

models of

the software. Models are expressed in the language of whatever model-

checking

system is used—for example, the SPIN model checker uses a language

called

Promela. A set of desirable system properties are identified and written in

a formal notation, usually based on temporal logic. For example, in the

wilderness weather

system, a property to be checked might be that the system will always

reach the

“transmitting” state from the “recording” state.

The model checker then explores all paths through the model (i.e., all

possible

state transitions), checking that the property holds for each path. If it

does, then the model checker confirms that the model is correct with

respect to that property. If it does not hold for a particular path, the model

checker outputs a counterexample

illustrating where the property is not true. Model checking is particularly

useful in the validation of concurrent systems, which are notoriously

difficult to test because of their sensitivity to time. The checker can

explore interleaved, concurrent transitions and discover potential

problems.

A key issue in model checking is the creation of the system model. If the

model has to be created manually (from a requirements or design

document), it is an expensive process as model creation takes a great deal

of time. In addition, there is the possibility that the model created will not

be an accurate model of the requirements or design. It is therefore

12.3 Safety engineering processes 359

best if the model can be created automatically from the program source

code. Model

checkers are available that work directly from programs in Java, C, C++,

and Ada.

Model checking is computationally very expensive because it uses an

exhaustive

approach to check all paths through the system model. As the size of a

system

increases, so too does the number of states, with a consequent increase in

the number of paths to be checked. For large systems, therefore, model

checking may be impractical, due to the computer time required to run

the checks. However, better algo-

rithms are under development that can identify parts of the state that do

not have to be explored when checking a particular property. As these

algorithms are incorporated into model checkers, it will be increasingly

possible to use model checking

routinely in large-scale critical systems development.

12.3.4 Static program analysis

Automated static analyzers are software tools that scan the source text of a

program and detect possible faults and anomalies. They parse the program

text and thus recognize the different types of statements in a program.

They can then detect whether or not statements are well formed, make

inferences about the control flow in the program,

and, in many cases, compute the set of all possible values for program

data. They

complement the error-detection facilities provided by the language

compiler, and they can be used as part of the inspection process or as a

separate V & V process activity.

Automated static analysis is faster and cheaper than detailed code reviews

and is

very effective in discovering some types of program faults. However, it

cannot dis-

cover some classes of errors that could be identified in program inspection

meetings.

Static analysis tools (Lopes, Vicente, and Silva 2009) work on the source

code of

a system, and, for some types of analysis at least, no further inputs are

required. This means that programmers do not need to learn specialized

notations to write program

specifications, so the benefits of analysis can be immediately clear. This

makes automated static analysis easier to introduce into a development

process than formal

verification or model checking.

The intention of automatic static analysis is to draw a code reader’s

attention to

anomalies in the program, such as variables that are used without

initialization, variables that are unused, or data whose values could go out

of range. Examples of the

problems that can be detected by static analysis are shown in Figure 12.9.

Of course, the specific checks made by the static analyzer are

programming-language-

specific and depend on what is and isn’t allowed in the language.

Anomalies are

often a result of programming errors or omissions, so they highlight things

that could go wrong when the program is executed. However, these

anomalies are not necessarily program faults; they may be deliberate

constructs introduced by the programmer,

or the anomaly may have no adverse consequences.

Three levels of checking may be implemented in static analyzers:

1. Characteristic error checking At this level, the static analyzer knows

about common errors that are made by programmers in languages such as

Java or C. The

tool analyzes the code looking for patterns that are characteristic of that

problem

360 Chapter 12 Safety engineering

Fault class

Static analysis check

Data faults

Variables used before initialization

Variables declared but never used

Variables assigned twice but never used between assignments

Possible array bound violations

Undeclared variables

Control faults

Unreachable code

Unconditional branches into loops

Input/output faults

Variables output twice with no intervening assignment

Interface faults

Parameter type mismatches

Parameter number mismatches

Nonusage of the results of functions

Uncalled functions and procedures

Storage management faults

Unassigned pointers

Pointer arithmetic

Memory leaks

Figure 12.9

Automated static

and highlights these to the programmer. Though relatively simple,

analysis based

analysis checks

on common errors can be very cost-effective. Zheng and his collaborators

(Zheng

et al. 2006) analyzed a large code base in C and C++. They discovered

that 90%

of the errors in the programs resulted from 10 types of characteristic error.

2. User-defined error checking In this approach, the users of the static

analyzer define error patterns to be detected. These may relate to the

application domain

or may be based on knowledge of the specific system that is being

developed.

An example of an error pattern is “maintain ordering”; for example,

method A

must always be called before method B. Over time, an organization can

collect

information about common bugs that occur in their programs and extend

the

static analysis tools with error patterns to highlight these errors.

3. Assertion checking This is the most general and most powerful approach

to static analysis. Developers include formal assertions (often written as

stylized

comments) in their program that state relationships that must hold at that

point

in a program. For example, the program might include an assertion stating

that

the value of some variable must lie in the range x..y. The analyzer

symbolically

executes the code and highlights statements where the assertion may not

hold.

Static analysis is effective in finding errors in programs but, commonly,

generates a large number of false positives. These are code sections where

there are no errors but where the static analyzer’s rules have detected a

potential for error. The number of false positives can be reduced by

adding more information to the program in the form of assertions, but this

requires additional work by the developer of the code. Work has to be

done in screening out these false positives before the code itself can be

checked for errors.

Many organizations now routinely use static analysis in their software

develop-

ment processes. Microsoft introduced static analysis in the development of

device

12.4 Safety cases 361

drivers where program failures can have a serious effect. They extended

the approach across a much wider range of their software to look for

security problems as well as errors that affect program reliability (Ball,

Levin, and Rajamani 2011). Checking for well-known problems, such as

buffer overflow, is effective for improving security as attackers often base

their attacks on those common vulnerabilities. Attacks may target little-

used code sections that may not have been thoroughly tested. Static

analysis is a cost-effective way of finding these types of vulnerability.

12.4 Safety cases

As I have discussed, many safety-critical, software-intensive systems are

regulated.

An external authority has significant influence on their development and

deployment.

Regulators are government bodies whose job is to ensure that commercial

companies

do not deploy systems that pose threats to public and environmental

safety or the

national economy. The owners of safety-critical systems must convince

regulators

that they have made the best possible efforts to ensure that their systems

are safe. The regulator assesses the safety case for the system, which

presents evidence and arguments that normal operation of the system will

not cause harm to a user.

This evidence is collected during the systems development process. It may

include information about hazard analysis and mitigation, test results,

static analyses, information about the development processes used, records

of review meetings,

and so on. It is assembled and organized into a safety case, a detailed

presentation of why the system owners and developers believe that a

system is safe.

A safety case is a set of documents that includes a description of the

system to be

certified, information about the processes used to develop the system, and,

critically, logical arguments that demonstrate that the system is likely to

be safe. More succinctly, Bishop and Bloomfield (Bishop and Bloomfield

1998) define a safety case as: A documented body of evidence that provides a

convincing and valid argument

that a system is adequately safe for a given application in a given environment

.

The organization and contents of a safety case depend on the type of

system that

is to be certified and its context of operation. Figure 12.10 shows one

possible structure for a safety case, but there are no universal industrial

standards in this area.

Safety case structures vary, depending on the industry and the maturity of

the domain.

For example, nuclear safety cases have been required for many years.

They are very

comprehensive and presented in a way that is familiar to nuclear

engineers. However, safety cases for medical devices have been introduced

more recently. The case structure is more flexible, and the cases

themselves are less detailed than nuclear cases.

A safety case refers to a system as a whole, and, as part of that case, there

may be a subsidiary software safety case. When constructing a software

safety case, you

have to relate software failures to wider system failures and demonstrate

either that

†Bishop, P., and R. E. Bloomfield. 1998. “A Methodology for Safety Case

Development.” In Proc. Safety-Critical Systems Symposium. Birmingham,

UK: Springer. http://www.adelard.com/papers/sss98web.pdf

362 Chapter 12 Safety engineering

Chapter

Description

System description

An overview of the system and a description of its critical components.

Safety

The safety requirements taken from the system requirements specification.

Details of requirements

other relevant system requirements may also be included.

Hazard and risk

Documents describing the hazards and risks that have been identified and

the

analysis

measures taken to reduce risk. Hazard analyses and hazard logs.

Design analysis

A set of structured arguments (see Section 12.4.1) that justify why the

design is safe.

Verification and

A description of the V & V procedures used and, where appropriate, the

test plans for validation

the system. Summaries of the test results showing defects that have been

detected

and corrected. If formal methods have been used, a formal system

specification and

any analyses of that specification. Records of static analyses of the source

code.

Review reports

Records of all design and safety reviews.

Team

Evidence of the competence of all of the team involved in safety-related

systems

competences

development and validation.

Process QA

Records of the quality assurance processes (see Chapter 24) carried out

during system development.

Change

Records of all changes proposed, actions taken, and, where appropriate,

justification of management

the safety of these changes. Information about configuration management

procedures

processes

and configuration management logs.

Associated safety

References to other safety cases that may impact the safety case.

cases

Figure 12.10 Possible

contents of a software these software failures will not occur or that they

will not be propagated in such a safety case

way that dangerous system failures may occur.

Safety cases are large and complex documents, and so they are very

expensive to

produce and maintain. Because of these high costs, safety-critical system

developers have to take the requirements of the safety case into account in

the development process: 1. Graydon et al. (Graydon, Knight, and Strunk

2007) argue that the development

of a safety case should be tightly integrated with system design and

implemen-

tation. This means that system design decisions may be influenced by the

requirements of the safety case. Design choices that may add significantly

to the

difficulties and costs of case development can then be avoided.

2. Regulators have their own views on what is acceptable and

unacceptable in a

safety case. It therefore makes sense for a development team to work with

them

from early in the development to establish what the regulator expects

from the

system safety case.

The development of safety cases is expensive because of the costs of the

record

keeping required as well as the costs of comprehensive system validation

and safety

assurance processes. System changes and rework also add to the costs of a

safety

12.4 Safety cases 363

EVIDENCE

Supports

EVIDENCE

<< ARGUMENT >>

CLAIM

Supports

Justifies

Supports

EVIDENCE

Figure 12.11 Structured

arguments

case. When software or hardware changes are made to a system, a large

part of the

safety case may have to be rewritten to demonstrate that the system safety

has not

been affected by the change.

12.4.1 Structured arguments

The decision on whether or not a system is operationally safe should be

based on

logical arguments. These arguments should demonstrate that the evidence

presented

supports the claims about a system’s security and dependability. These

claims may

be absolute (event X will or will not happen) or probabilistic (the

probability of

occurrence of event Y is 0.n). An argument links the evidence and the

claim. As

shown in Figure 12.11, an argument is a relationship between what is

thought to be

the case (the claim) and a body of evidence that has been collected. The

argument

essentially explains why the claim, which is an assertion about system

security or

dependability, can be inferred from the available evidence.

Arguments in a safety case are usually presented as “claim based”

arguments.

Some claim about system safety is made, and, on the basis of available

evidence,

an argument is presented as to why that claim holds. For example, the

following

argument might be used to justify a claim that computations carried out

by the con-

trol software in an insulin pump will not lead to an overdose of insulin

being delivered. Of course, this is a very simplified presentation of the

argument. In a real

safety case, more detailed references to the evidence would be presented.

Claim: The maximum single dose computed by the insulin pump will not

exceed maxDose, where maxDose has been assessed as a safe single dose

for a

particular patient.

Evidence: Safety argument for insulin pump software control program

(covered later in this section).

Evidence: Test datasets for the insulin pump. In 400 tests, which provided

complete code coverage, the value of the dose of insulin to be delivered,

currentDose,

never exceeded maxDose.

364 Chapter 12 Safety engineering

The insulin pump will

not deliver a single

dose of insulin that is

unsafe

The maximum single

maxDose is set up

maxDose is a safe

dose computed by

correctly when the

dose for the user of

the pump software

pump is configured

the insulin pump

will not exceed

maxDose

In normal

If the software fails,

operation, the

the maximum dose

maximum dose

computed will not

computed will not

exceed maxDose

exceed maxDose

Figure 12.12 A safety

Evidence: A static analysis report for the insulin pump control program.

The static claim hierarchy for the

insulin pump

analysis of the control software revealed no anomalies that affected the

value of

currentDose, the program variable that holds the dose of insulin to be

delivered.

Argument: The evidence presented demonstrates that the maximum dose of

insulin that can be computed is equal to maxDose.

It is therefore reasonable to assume, with a high level of confidence, that

the evi-

dence justifies the claim that the insulin pump will not compute a dose of

insulin

to be delivered that exceeds the maximum single safe dose.

The evidence presented is both redundant and diverse. The software is

checked using

several different mechanisms with significant overlap between them. As I

discussed

in Chapter 10, using redundant and diverse processes increases

confidence. If omis-

sions and mistakes are not detected by one validation process, there is a

good chance that they will be found by one of the other processes.

There will normally be many claims about the safety of a system, with the

validity of one claim often depending on whether or not other claims are

valid. Therefore, claims may be organized in a hierarchy. Figure 12.12

shows part of this claim hierarchy for the insulin pump. To demonstrate

that a high-level claim is valid, you first have to work through the

arguments for lower-level claims. If you can show that each of these

lower-level claims is justified, then you may be able to infer that the

higher-level claims are justified.

12.4.2 Software safety arguments

A general assumption that underlies work in system safety is that the

number of sys-

tem faults that can lead to safety hazards is significantly less than the total

number of faults that may exist in the system. Safety assurance can

therefore concentrate on

12.4 Safety cases 365

these faults, which have hazard potential. If it can be demonstrated that

these faults cannot occur or, if they occur, that the associated hazard will

not result in an accident, then the system is safe. This is the basis of

software safety arguments.

Software safety arguments are a type of structured argument which

demonstrates

that a program meets its safety obligations. In a safety argument, it is not

necessary to prove that the program works as intended. It is only

necessary to show that program

execution cannot result in it reaching a potentially unsafe state. Safety

arguments are therefore cheaper to make than correctness arguments. You

don’t have to consider all program states—you can simply concentrate on

states that could lead to a hazard.

Safety arguments demonstrate that, assuming normal execution

conditions, a pro-

gram should be safe. They are usually based on contradiction, where you

assume

that the system is unsafe and then show that it is impossible to reach an

unsafe state.

The steps involved in creating a safety argument are:

1. You start by assuming that an unsafe state, which has been identified by

the

system hazard analysis, can be reached by executing the program.

2. You write a predicate (a logical expression) that defines this unsafe

state.

3. You then systematically analyze a system model or the program and

show that,

for all program paths leading to that state, the terminating condition of

these paths, also defined as a predicate, contradicts the unsafe state

predicate. If this is the

case, you may then claim that the initial assumption of an unsafe state is

incorrect.

4. When you have repeated this analysis for all identified hazards, then

you have

strong evidence that the system is safe.

Safety arguments can be applied at different levels, from requirements

through

design models to code. At the requirements level, you are trying to

demonstrate that there are no missing safety requirements and that the

requirements do not make invalid assumptions about the system. At the

design level, you might analyze a state model of the system to find unsafe

states. At the code level, you consider all of the paths through the safety-

critical code to show that the execution of all paths leads to a

contradiction.

As an example, consider the code outlined in Figure 12.13, which is a

simpli-

fied description of part of the implementation of the insulin delivery

system. The

code computes the dose of insulin to be delivered and then applies some

safety

checks that this is not an overdose for that patient. Developing a safety

argument

for this code involves demonstrating that the dose of insulin administered

is never

greater than the maximum safe level for a single dose. This dose is

established for

each individual diabetic user in discussions with their medical advisors.

To demonstrate safety, you do not have to prove that the system delivers

the “cor-

rect” dose, but merely that it never delivers an overdose to the patient.

You work on the assumption that maxDose is the safe level for that system

user.

To construct the safety argument, you identify the predicate that defines

the unsafe state, which is that currentDose > maxDose. You then

demonstrate that all program paths lead to a contradiction of this unsafe

assertion. If this is the case, the unsafe condition cannot be true. If you can

prove a contradiction, you can be confident that

366 Chapter 12 Safety engineering

— The insulin dose to be delivered is a function of

— blood sugar level, the previous dose delivered and

— the time of delivery of the previous dose

currentDose = computeInsulin () ;

// Safety check—adjust currentDose if necessary.

// if statement 1

if (previousDose == 0)

{ if (currentDose > maxDose/2)

currentDose = maxDose/2 ;

}else

if (currentDose > (previousDose * 2) )

currentDose = previousDose * 2 ;

// if statement 2

if ( currentDose < minimumDose )

currentDose = 0 ;

else if ( currentDose > maxDose )

Figure 12.13 Insulin

currentDose = maxDose ;

dose computation with

administerInsulin (currentDose) ;

safety checks

the program will not compute an unsafe dose of insulin. You can structure

and present the safety arguments graphically as shown in Figure 12.14.

The safety argument shown in Figure 12.14 presents three possible

program paths

that lead to the call to the administerInsulin method. You have to show

that the

amount of insulin delivered never exceeds maxDose. All possible program

paths to

administerInsulin are considered:

1. Neither branch of if-statement 2 is executed. This can only happen if

current-

Dose is outside of the range minimumDose..maxDose. The postcondition

predi-

cate is therefore:

currentDose >= minimumDose and currentDose <= maxDose

2. The then-branch of if-statement 2 is executed. In this case, the

assignment set-

ting currentDose to zero is executed. Therefore, its postcondition predicate

is

currentDose = 0.

3. The else-if-branch of if-statement 2 is executed. In this case, the

assignment setting currentDose to maxDose is executed. Therefore, after

this statement has

been executed, we know that the postcondition is currentDose =

maxDose.

In all three cases, the postcondition predicates contradict the unsafe

precondition

that currentDose > maxDose. As both cannot be true, we can claim that

our initial assumption was incorrect, and so the computation is safe.

To construct a structured argument that a program does not make an

unsafe computa-

tion, you first identify all possible paths through the code that could lead

to a potentially

12.4 Safety cases 367

Overdose

administered

administerInsulin

currentDose >

Precondition

maxDose

for unsafe state

or

Contradiction

currentDose >= minimumDose and

currentDose <= maxDose

Contradiction

Contradiction

currentDose =

currentDose = 0

maxDose

assign

assign

if statement 2

currentDose =

currentDose = 0

not executed

maxDose

if statement 2

if statement 2

then branch

else branch

Figure 12.14 Informal

executed

executed

safety argument based

on demonstrating

contradictions

unsafe assignment. You work backwards from the unsafe state and

consider the last

assignment to all of the state variables on each path leading to this unsafe

state. If you can show that none of the values of these variables is unsafe,

then you have shown that your initial assumption (that the computation is

unsafe) is incorrect.

Working backwards is important because it means that you can ignore all

inter-

mediate states apart from the final states that lead to the exit condition for

the code.

The previous values don’t matter to the safety of the system. In this

example, all you need be concerned with is the set of possible values of

currentDose immediately

before the administerInsulin method is executed. You can ignore

computations, such

as if-statement 1 in Figure 12.13 in the safety argument because their

results are

overwritten in later program statements.

368 Chapter 12 Safety engineering

K e y P o i n t s

Safety-critical systems are systems whose failure can lead to human

injury or death.

A hazard-driven approach may be used to understand the safety

requirements for safety-critical systems.

You identify potential hazards and decompose them (using methods such

as fault tree analysis) to discover their root causes. You then specify

requirements to avoid or recover from these problems.

It is important to have a well-defined, certified process for safety-critical

systems development.

The process should include the identification and monitoring of potential

hazards.

Static analysis is an approach to V & V that examines the source code

(or other representation) of a system, looking for errors and anomalies. It

allows all parts of a program to be checked, not just those parts that are

exercised by system tests.

Model checking is a formal approach to static analysis that exhaustively

checks all states in a system for potential errors.

Safety and dependability cases collect all of the evidence that

demonstrates a system is safe and dependable. Safety cases are required

when an external regulator must certify the system before it is used.

F u r t h e r r e a d i n g

Safeware: System Safety and Computers. Although now 20 years old, this

book still offers the best and most thorough coverage of safety-critical

systems. It is particularly strong in its description of hazard analysis and

the derivation of requirements from it. (N. Leveson, Addison-Wesley,

1995).

“Safety-Critical Software.” A special edition of IEEE Software magazine

that focuses on safety-critical systems. It includes papers on model-based

development of safety-critical systems, model checking and formal

methods. ( IEEE Software, 30 (3), May/June 2013).

“Constructing Safety Assurance Cases for Medical Devices.” This short

paper gives a practical example of how a safety case can be created for an

analgesic pump. (A. Ray and R. Cleaveland, Proc. Workshop

on Assurance Cases for Software-Intensive Systems, San Francisco, 2013)

http://dx.doi.org/10.1109/

ASSURE.2013.6614270

W e b S i t e

PowerPoint slides for this chapter:

www.pearsonglobaleditions.com/Sommerville

Links to supporting videos:

http://software-engineering-book.com/videos/reliability-and-safety/

12.4

Chapter

12 Safety

cases

Exercises 369

e x e r C i S e S

12.1. Identify six consumer products that are likely to be controlled by

safety-critical software systems.

12.2. A software system is to be deployed for a company that has

extremely high safety standards and allows for almost no risks, not even

minor injuries. How will this affect the look of the risk triangle in Figure

12.3?

12.3. In the insulin pump system, the user has to change the needle and

insulin supply at regular intervals and may also change the maximum

single dose and the maximum daily dose that may be administered.

Suggest three user errors that might occur and propose safety

requirements that would avoid these errors resulting in an accident.

12.4. A safety-critical software system for managing roller coasters

controls two main components:

The lock and release of the roller coaster harness which is supposed to

keep riders in place as the coaster performs sharp and sudden moves. The

roller coaster could not move with any unlocked harnesses.

The minimum and maximum speeds of the roller coaster as it moves along

the various segments of the ride to prevent derailing, given the number of

people riding the roller coaster.

Identify three hazards that may arise in this system. For each hazard,

suggest a

defensive requirement that will reduce the probability that these hazards

will result in an accident. Explain why your suggested defense is likely to

reduce the risk associated with the hazard.

12.5. A train protection system automatically applies the brakes of a train

if the speed limit for a segment of track is exceeded, or if the train enters a

track segment that is currently signaled with a red light (i.e., the segment

should not be entered). There are two critical-safety requirements for this

train protection system:

The train shall not enter a segment of track that is signaled with a red

light.

The train shall not exceed the specified speed limit for a section of track.

Assuming that the signal status and the speed limit for the track segment

are transmitted to on-board software on the train before it enters the track

segment, propose five possible functional system requirements for the

onboard software that may be generated from the system safety

requirements.

12.6. Explain when it may be cost-effective to use formal specification and

verification in the development of safety-critical software systems. Why do

you think that some critical systems engineers are against the use of

formal methods?

12.7. Explain why using model checking is sometimes a more cost-

effective approach to verification than verifying a program’s correctness

against a formal specification.

12.8. List four types of systems that may require software safety cases,

explaining why safety cases are required.

12.9. The door lock control mechanism in a nuclear waste storage facility

is designed for safe operation. It ensures that entry to the storeroom is

only permitted when radiation shields are

370 Chapter 12 Safety engineering

1

entryCode = lock.getEntryCode () ;

2

if (entryCode == lock.authorizedCode)

3 {

4

shieldStatus = Shield.getStatus ();

5

radiationLevel = RadSensor.get ();

6

if (radiationLevel < dangerLevel)

7

state = safe;

8 else

9

state = unsafe;

10

if (shieldStatus == Shield.inPlace() )

11

state = safe;

12

if (state == safe)

13

{

14

Door.locked = false ;

15

Door.unlock

();

16

}

17 else

18 {

19

Door.lock ( );

20

Door.locked := true ;

21 }

Figure 12.15 Door

22 }

entry code

in place or when the radiation level in the room falls below some given

value (dangerLevel).

So:

(i) If remotely controlled radiation shields are in place within a room, an

authorized operator may open the door.

(ii) If the radiation level in a room is below a specified value, an

authorized operator may open the door.

(iii) An authorized operator is identified by the input of an authorized

door entry code.

The code shown in Figure 12.15 controls the door-locking mechanism.

Note that the safe state is that entry should not be permitted. Using the

approach discussed in this chapter, develop a safety argument for this

code. Use the line numbers to refer to specific statements. If you find that

the code is unsafe, suggest how it should be modified to make it safe.

12.10. Should software engineers working on the specification and

development of safety-related systems be professionally certified or

licensed in some way? Explain your reasoning.

r e F e r e n C e S

Abrial, J. R. 2010. Modeling in Event-B: System and Software Engineering.

Cambridge, UK: Cambridge University Press.

Ball, T., V. Levin, and S. K. Rajamani. 2011. “A Decade of Software Model

Checking with SLAM.”

Communications of the ACM 54 (7) (July 1): 68.

doi:10.1145/1965724.1965743.

Chapter 12 References 371

Behm, P., P. Benoit, A. Faivre, and J-M. Meynadier. 1999. “Meteor: A

Successful Application of B

in a Large Project.” In Formal Methods’99, 369–387. Berlin: Springer-

Verlag. doi:10.1007/

3-540-48119-2_22.

Bishop, P., and R. E. Bloomfield. 1998. “A Methodology for Safety Case

Development.” In Proc.

Safety-Critical Systems Symposium. Birmingham, UK: Springer. http://

www.adelard.com/papers/

sss98web.pdf

Bochot, T., P. Virelizier, H. Waeselynck, and V. Wiels. 2009. “Model

Checking Flight Control Systems: The Airbus Experience.” In Proc. 31st

International Conf. on Software Engineering, Companion Volume, 18–27.

Leipzig: IEEE Computer Society Press. doi:10.1109/ICSE-COMPANION.

2009.5070960.

Dehbonei, B., and F. Mejia. 1995. “Formal Development of Safety-Critical

Software Systems in Railway Signalling.” In Applications of Formal Methods,

edited by M. Hinchey and J. P. Bowen, 227–252.

London: Prentice-Hall.

Graydon, P. J., J. C. Knight, and E. A. Strunk. 2007. “Assurance Based

Development of Critical Systems.” In Proc. 37th Annual IEEE Conf. on

Dependable Systems and Networks, 347–357. Edinburgh, Scotland.

doi:10.1109/DSN.2007.17.

Holzmann, G. J. 2014. “Mars Code.” Comm ACM 57 (2): 64–73.

doi:10.1145/2560217.2560218.

Jhala, R., and R. Majumdar. 2009. “Software Model Checking.” Computing

Surveys 41 (4).

doi:10.1145/1592434.1592438.

Kwiatkowska, M., G. Norman, and D. Parker. 2011. “PRISM 4.0:

Verification of Probabilistic Real-Time Systems.” In Proc. 23rd Int. Conf. on

Computer Aided Verification, 585–591. Snowbird, UT: Springer-Verlag.

doi:10.1007/978-3-642-22110-1_47.

Leveson, N. G., S. S. Cha, and T. J. Shimeall. 1991. “Safety Verification of

Ada Programs Using Software Fault Trees.” IEEE Softwar e 8 (4): 48–59.

doi:10.1109/52.300036.

Lopes, R., D. Vicente, and N. Silva. 2009. “Static Analysis Tools, a

Practical Approach for Safety-Critical Software Verification.” In

Proceedings of DASIA 2009 Data Systems in Aerospace.

Noordwijk, Netherlands: European Space Agency.

Lutz, R. R. 1993. “Analysing Software Requirements Errors in Safety-

Critical Embedded Systems.”

In RE’93, 126–133. San Diego, CA: IEEE. doi:0.1109/ISRE.1993.324825.

Moy, Y., E. Ledinot, H. Delseny, V. Wiels, and B. Monate. 2013. “Testing

or Formal Verification: DO-178C Alternatives and Industrial Experience.”

IEEE Software 30 (3) (May 1): 50–57. doi:10.1109/

MS.2013.43.

Perrow, C. 1984. Normal Accidents: Living with High-Risk Technology. New

York: Basic Books.

Regan, P., and S. Hamilton. 2004. “NASA’s Mission Reliable.” IEEE

Computer 37 (1): 59–68.

doi:10.1109/MC.2004.1260727.

Schneider, S. 1999. Concurrent and Real-Time Systems: The CSP Approach.

Chichester, UK: John Wiley & Sons.

372 Chapter 12 Safety engineering

Souyris, J., V. Weils, D. Delmas, and H. Delseny. 2009. “Formal

Verification of Avionics Software Products.” In Formal Methods’09:

Proceedings of the 2nd World Congress on Formal Methods, 532–546.

Springer-Verlag. doi:10.1007/978-3-642-05089-3_34.

Storey, N. 1996. Safety-Critical Computer Systems. Harlow, UK: Addison-

Wesley.

Veras, P. C., E. Villani, A. M. Ambrosio, N. Silva, M. Vieira, and H.

Madeira. 2010. “Errors in Space Software Requirements: A Field Study and

Application Scenarios.” In 21st Int. Symp. on Software Reliability

Engineering. San Jose, CA. doi:10.1109/ISSRE.2010.37.

Zheng, J., L. Williams, N. Nagappan, W. Snipes, J. P. Hudepohl, and M. A.

Vouk. 2006. “On the Value of Static Analysis for Fault Detection in

Software.” IEEE Trans. on Software Eng. 32 (4): 240–253.

doi:10.1109/TSE.2006.38.

13

Security engineering

Objectives

The objective of this chapter is to introduce security issues that you

should consider when you are developing application systems. When you

have read this chapter, you will:

understand the importance of security engineering and the difference

between application security and infrastructure security;

know how a risk-based approach can be used to derive security

requirements and analyze system designs;

know of software architectural patterns and design guidelines for

secure systems engineering;

understand why security testing and assurance is difficult and

expensive.

Contents

13.1 Security and dependability

13.2 Security and organizations

13.3 Security requirements

13.4 Secure systems design

13.5 Security testing and assurance

374 Chapter 13 Security engineering

The widespread adoption of the Internet in the 1990s introduced a new

challenge for

software engineers—designing and implementing systems that were

secure. As more

and more systems were connected to the Internet, a variety of different

external attacks were devised to threaten these systems. The problems of

producing dependable systems were hugely increased. Systems engineers

had to consider threats from malicious and technically skilled attackers as

well as problems resulting from accidental mistakes in the development

process.

It is now essential to design systems to withstand external attacks and to

recover

from such attacks. Without security precautions, attackers will inevitably

compromise a networked system. They may misuse the system hardware,

steal confidential data,

or disrupt the services offered by the system.

You have to take three security dimensions into account in secure systems

engineering: 1. Confidentiality Information in a system may be disclosed or

made accessible to people or programs that are not authorized to have

access to that information.

For example, the theft of credit card data from an e-commerce system is a

confidentiality problem.

2. Integrity Information in a system may be damaged or corrupted, making

it unusual or unreliable. For example, a worm that deletes data in a

system is an

integrity problem.

3. Availability Access to a system or its data that is normally available may

not be possible. A denial-of-service attack that overloads a server is an

example of a

situation where the system availability is compromised.

These dimensions are closely related. If an attack makes the system

unavailable,

then you will not be able to update information that changes with time.

This means

that the integrity of the system may be compromised. If an attack succeeds

and the

integrity of the system is compromised, then it may have to be taken down

to repair

the problem. Therefore, the availability of the system is reduced.

From an organizational perspective, security has to be considered at three

levels:

1. Infrastructure security, which is concerned with maintaining the security

of all systems and networks that provide an infrastructure and a set of

shared services

to the organization.

2. Application security, which is concerned with the security of individual

application systems or related groups of systems.

3. Operational security, which is concerned with the secure operation and

use of the organization’s systems.

Figure 13.1 is a diagram of an application system stack that shows how an

application system relies on an infrastructure of other systems in its

operation. The lower levels of the infrastructure are hardware, but the

software infrastructure for application systems may include:

Chapter 13 Security engineering 375

Application

Reusable components and libraries

Middleware

Database management

Generic, shared applications (browsers, email, etc.)

Operating System

Figure 13.1 System

layers where security

Network

Computer hardware

may be compromised

an operating system platform, such as Linux or Windows;

other generic applications that run on that system, such as web

browsers and email clients;

a database management system;

middleware that supports distributed computing and database access;

and

libraries of reusable components that are used by the application

software.

Network systems are software controlled, and networks may be subject to

security

threats where an attacker intercepts and reads or changes network

packets. However,

this requires specialized equipment, so the majority of security attacks are

on the

software infrastructure of systems. Attackers focus on software

infrastructures

because infrastructure components, such as web browsers, are universally

available.

Attackers can probe these systems for weaknesses and share information

about

vulnerabilities that they have discovered. As many people use the same

software,

attacks have wide applicability.

Infrastructure security is primarily a system management problem, where

system

managers configure the infrastructure to resist attacks. System security

management

includes a range of activities such as user and permission management,

system

software deployment and maintenance, and attack monitoring, detection,

and recovery: 1. User and permission management involves adding and

removing users from the

system, ensuring that appropriate user authentication mechanisms are in

place,

and setting up the permissions in the system so that users only have access

to the

resources they need.

2. System software deployment and maintenance involves installing

system software

and middleware and configuring these properly so that security

vulnerabilities are

avoided. It also involves updating this software regularly with new

versions or

patches, which repair security problems that have been discovered.

376 Chapter 13 Security engineering

3. Attack monitoring, detection, and recovery involves monitoring the

system for

unauthorized access, detecting and putting in place strategies for resisting

attacks, and organizing backups of programs and data so that normal

operation

can be resumed after an external attack.

Operational security is primarily a human and social issue. It focuses on

ensuring

that the people using the system do not behave in such a way that system

security is compromised. For example, users may leave themselves logged

on to a system while

it is unattended. An attacker can then easily get access to the system.

Users often

behave in an insecure way to help them do their jobs more effectively, and

they have good reason to behave in an insecure way. A challenge for

operational security is to raise awareness of security issues and to find the

right balance between security and system effectiveness.

The term cybersecurity is now commonly used in discussions of system

security.

Cybersecurity is a very wide-ranging term that covers all aspects of the

protection of citizens, businesses, and critical infrastructures from threats

that arise from their use of computers and the Internet. Its scope includes

all system levels from hardware

and networks through application systems to mobile devices that may be

used to

access these systems. I discuss general cybersecurity issues, including

infrastructure security, in Chapter 14, which covers resilience engineering.

In this chapter, I focus on issues of application security engineering—

security

requirements, design for security, and security testing. I don’t cover

general security techniques that may be used, such as encryption, and

access control mechanisms or

attack vectors, such as viruses and worms. General textbooks on computer

security

(Pfleeger and Pfleeger 2007; Anderson 2008; Stallings and Brown 2012)

discuss

these techniques in detail.

13.1 Security and dependability

Security is a system attribute that reflects the ability of the system to

protect itself from malicious internal or external attacks. These external

attacks are possible

because most computers and mobile devices are networked and are

therefore

accessible by outsiders. Examples of attacks might be the installation of

viruses and Trojan horses, unauthorized use of system services, or

unauthorized modification of

a system or its data.

If you really want a system to be as secure as possible, it is best not to

connect it to the Internet. Then, your security problems are limited to

ensuring that authorized users do not abuse the system and to controlling

the use of devices such as USB

drives. In practice, however, networked access provides huge benefits for

most

systems, so disconnecting from the Internet is not a viable security option.

For some systems, security is the most important system dependability

attribute.

Military systems, systems for electronic commerce, and systems that

involve the

processing and interchange of confidential information must be designed

so that

13.1 Security and dependability 377

Term

Definition

Asset

Something of value that has to be protected. The asset may be the

software system

itself or the data used by that system.

Attack

An exploitation of a system’s vulnerability where an attacker has the goal

of causing some damage to a system asset or assets. Attacks may be from

outside the system

(external attacks) or from authorized insiders (insider attacks).

Control

A protective measure that reduces a system’s vulnerability. Encryption is

an example of a control that reduces a vulnerability of a weak access

control system.

Exposure

Possible loss or harm to a computing system. This can be loss or damage

to data or

can be a loss of time and effort if recovery is necessary after a security

breach.

Threat

Circumstances that have potential to cause loss or harm. You can think of

a threat as a system vulnerability that is subjected to an attack.

Vulnerability

A weakness in a computer-based system that may be exploited to cause

loss or harm.

Figure 13.2 Security

terminology

Unauthorized access to the Mentcare system

Clinic staff log on to the Mentcare system using a username and password.

The system requires passwords to be at least eight letters long but allows

any password to be set without further checking. A criminal finds out that

a well-paid sports star is receiving treatment for mental health problems.

He would like to gain illegal access to information in this system so that

he can blackmail the star.

By posing as a concerned relative and talking with the nurses in the

mental health clinic, he discovers how to access the system and personal

information about the nurses and their families. By checking name badges,

he discovers the names of some of the people allowed access. He then

attempts to log on to the system by using these names and systematically

guessing possible passwords, such as the names of the nurses’ children.

Figure 13.3 A security

story for the Mentcare

system

they achieve a high level of security. If an airline reservation system is

unavailable, for example, this causes inconvenience and some delays in

issuing tickets. However,

if the system is insecure, then an attacker could delete all bookings and it

would be practically impossible for normal airline operations to continue.

As with other aspects of dependability, a specialized terminology is

associated

with security (Pfleeger and Pfleeger 2007). This terminology is explained

in Figure

13.2. Figure 13.3 is a security story from the Mentcare system that I use to

illustrate some of these terms. Figure 13.4 takes the security concepts

defined in Figure 13.2

and shows how they apply to this security story.

System vulnerabilities may arise because of requirements, design, or

implementation problems, or they may stem from human, social, or

organizational failings. People may choose easy-to-guess passwords or

write down their passwords in places where they

can be found. System administrators make errors in setting up access

control or con-

figuration files, and users don’t install or use protection software.

However, we cannot simply class these problems as human errors. User

mistakes or omissions often reflect poor systems design decisions that

require, for example, frequent password changes

(so that users write down their passwords) or complex configuration

mechanisms.

378 Chapter 13 Security engineering

Term

Example

Asset

The record of each patient who is receiving or has received treatment.

Attack

An impersonation of an authorized user.

Control

A password checking system that disallows user passwords that are proper

names

or words that are normally included in a dictionary.

Exposure

Potential financial loss from future patients who do not seek treatment

because

they do not trust the clinic to maintain their data. Financial loss from legal

action by the sports star. Loss of reputation.

Threat

An unauthorized user will gain access to the system by guessing the

credentials

(login name and password) of an authorized user.

Vulnerability

Authentication is based on a password system that does not require strong

passwords. Users can then set easily guessable passwords.

Figure 13.4 Examples

Four types of security threats may arise:

of security terminology

1. Interception threats that allow an attacker to gain access to an asset. So,

a

possible threat to the Mentcare system might be a situation where an

attacker

gains access to the records of an individual patient.

2. Interruption threats that allow an attacker to make part of the system

unavailable.

Therefore, a possible threat might be a denial-of-service attack on a system

database server.

3. Modification threats that allow an attacker to tamper with a system

asset. In the Mentcare system, a modification threat would be where an

attacker alters or

destroys a patient record.

4. Fabrication threats that allow an attacker to insert false information

into a system. This is perhaps not a credible threat in the Mentcare system

but would

certainly be a threat in a banking system, where false transactions might

be

added to the system that transfers money to the perpetrator’s bank

account.

The controls that you might put in place to enhance system security are

based on

the fundamental notions of avoidance, detection, and recovery:

1. Vulnerability avoidance Controls that are intended to ensure that attacks

are unsuccessful. The strategy here is to design the system so that security

problems

are avoided. For example, sensitive military systems are not connected to

the

Internet so that external access is more difficult. You should also think of

encryption as a control based on avoidance. Any unauthorized access to

encrypted data means that the attacker cannot read the encrypted data. It

is

expensive and time consuming to crack strong encryption.

2. Attack detection and neutralization Controls that are intended to detect

and repel attacks. These controls involve including functionality in a

system that

monitors its operation and checks for unusual patterns of activity. If these

13.1 Security and dependability 379

attacks are detected, then action may be taken, such as shutting down

parts of

the system or restricting access to certain users.

3. Exposure limitation and recovery Controls that support recovery from

problems. These can range from automated backup strategies and

information “mir-

roring” through to insurance policies that cover the costs associated with a

successful attack on the system.

Security is closely related to the other dependability attributes of

reliability, availability, safety, and resilience:

1. Security and reliability If a system is attacked and the system or its data

are corrupted as a consequence of that attack, then this may induce

system failures that

compromise the reliability of the system.

Errors in the development of a system can lead to security loopholes. If a

system

does not reject unexpected inputs or if array bounds are not checked, then

attackers can exploit these weaknesses to gain access to the system. For

example, failure to check the validity of an input may mean that an

attacker can inject and execute malicious code.

2. Security and availability A common attack on a web-based system is a

denial-of-service attack, where a web server is flooded with service

requests from a

range of different sources. The aim of this attack is to make the system

unavail-

able. A variant of this attack is where a profitable site is threatened with

this

type of attack unless a ransom is paid to the attackers.

3. Security and safety Again, the key problem is an attack that corrupts the

system or its data. Safety checks are based on the assumption that we can

analyze the

source code of safety-critical software and that the executing code is a

com-

pletely accurate translation of that source code. If this is not the case,

because an attacker has changed the executing code, safety-related failures

may be induced

and the safety case made for the software is invalid.

Like safety, we cannot assign a numeric value to the security of a system,

nor

can we exhaustively test the system for security. Both safety and security

can be

thought of as “negative” or “shall not” characteristics in that they are

concerned

with things that should not happen. As we can never prove a negative, we

can

never prove that a system is safe or secure.

4. Security and resilience Resilience, covered in Chapter 14, is a system

characteristic that reflects its ability to resist and recover from damaging

events. The

most probable damaging event on networked software systems is a

cyberattack

of some kind, so most of the work now done in resilience is aimed at

deterring,

detecting, and recovering from such attacks.

Security has to be maintained if we are to create reliable, available, and

safe software-intensive systems. It is not an add-on, which can be added

later but has to be considered at all stages of the development life cycle

from early requirements to system operation.

380 Chapter 13 Security engineering

13.2 Security and organizations

Building secure systems is expensive and uncertain. It is impossible to

predict the

costs of a security failure, so companies and other organizations find it

difficult to judge how much they should spend on system security. In this

respect, security and

safety are different. There are laws that govern workplace and operator

safety, and

developers of safety-critical systems have to comply with these

irrespective of the

costs. They may be subject to legal action if they use an unsafe system.

However,

unless a security failure discloses personal information, there are no laws

that prevent an insecure system from being deployed.

Companies assess the risks and losses that may arise from certain types of

attacks

on system assets. They may then decide that it is cheaper to accept these

risks rather than build a secure system that can deter or repel the external

attacks. Credit card companies apply this approach to fraud prevention. It

is usually possible to introduce new technology to reduce credit card

fraud. However, it is often cheaper for these

companies to compensate users for their losses due to fraud than to buy

and deploy

fraud-reduction technology.

Security risk management is therefore a business rather than a technical

issue. It

has to take into account the financial and reputational losses from a

successful system attack as well as the costs of security procedures and

technologies that may

reduce these losses. For risk management to be effective, organizations

should have

a documented information security policy that sets out:

1. The assets that must be protected It does not necessarily make sense to

apply stringent security procedures to all organizational assets. Many

assets are not confidential, and a company can improve its image by

making these assets freely

available. The costs of maintaining the security of information that is in

the public domain are much less than the costs of keeping confidential

information secure.

2. The level of protection that is required for different types of assets Not all

assets need the same level of protection. In some cases (e.g., for sensitive

personal

information), a high level of security is required; for other information,

the con-

sequences of loss may be minor, so a lower level of security is adequate.

Therefore, some information may be made available to any authorized and

logged-in user; other information may be much more sensitive and only

availa-

ble to users in certain roles or positions of responsibility.

3. The responsibilities of individual users, managers, and the organization The

security policy should set out what is expected of users—for example, use

strong passwords, log out of computers, and lock offices. It also defines

what

users can expect from the company, such as backup and information-

archiving

services, and equipment provision.

4. Existing security procedures and technologies that should be maintained For

reasons of practicality and cost, it may be essential to continue to use

existing

approaches to security even where these have known limitations. For

example,

13.2 Security and organizations 381

a company may require the use of a login name/password for

authentication,

simply because other approaches are likely to be rejected by users.

Security policies often set out general information access strategies that

should

apply across the organization. For example, an access strategy may be

based on the

clearance or seniority of the person accessing the information. Therefore,

a military security policy may state: “Readers may only examine

documents whose classification is the same as or below the reader’s

vetting level.” This means that if a reader has been vetted to a “secret”

level, he or she may access documents that are classed as secret,

confidential, or open but not documents classed as top secret.

The point of security policies is to inform everyone in an organization

about secu-

rity, so these should not be long and detailed technical documents. From a

security

engineering perspective, the security policy defines, in broad terms, the

security

goals of the organization. The security engineering process is concerned

with imple-

menting these goals.

13.2.1 Security risk assessment

Security risk assessment and management are organizational activities that

focus on identifying and understanding the risks to information assets

(systems and data) in the organization. In principle, an individual risk

assessment should be carried out for all assets; in practice, however, this

may be impractical if a large number of existing systems and databases

need to be assessed. In those situations, a generic assessment may be

applied to all of them. However, individual risk assessments should be

carried out for new systems.

Risk assessment and management is an organizational activity rather than

a tech-

nical activity that is part of the software development life cycle. The

reason for this is that some types of attack are not technology-based but

rather rely on weaknesses

in more general organizational security. For example, an attacker may

gain access to equipment by pretending to be an accredited engineer. If an

organization has a process to check with the equipment supplier that an

engineer’s visit is planned, this can deter this type of attack. This approach

is much simpler than trying to address the

problem using a technological solution.

When a new system is to be developed, security risk assessment and

management

should be a continuing process throughout the development life cycle

from initial

specification to operational use. The stages of risk assessment are:

1. Preliminary risk assessment The aim of this initial risk assessment is to

identify generic risks that are applicable to the system and to decide if an

adequate level

of security can be achieved at a reasonable cost. At this stage, decisions on

the

detailed system requirements, the system design, or the implementation

technol-

ogy have not been made. You don’t know of potential technology

vulnerabilities

or the controls that are included in reused system components or

middleware.

The risk assessment should therefore focus on the identification and

analysis of

high-level risks to the system. The outcomes of the risk assessment process

are

used to help identify security requirements.

382 Chapter 13 Security engineering

2. Design risk assessment This risk assessment takes place during the system

development life cycle and is informed by the technical system design and

implementa-

tion decisions. The results of the assessment may lead to changes to the

security

requirements and the addition of new requirements. Known and potential

vulnera-

bilities are identified, and this knowledge is used to inform decision

making about

the system functionality and how it is to be implemented, tested, and

deployed.

3. Operational risk assessment This risk assessment process focuses on the

use of the system and the possible risks that can arise. For example, when

a system is

used in an environment where interruptions are common, a security risk is

that a

logged-in user leaves his or her computer unattended to deal with a

problem. To

counter this risk, a timeout requirement may be specified so that a user is

auto-

matically logged out after a period of inactivity.

Operational risk assessment should continue after a system has been

installed to

take account of how the system is used and proposals for new and

changed require-

ments. Assumptions about the operating requirement made when the

system was

specified may be incorrect. Organizational changes may mean that the

system is

used in different ways from those originally planned. These changes lead

to new

security requirements that have to be implemented as the system evolves.

13.3 Security requirements

The specification of security requirements for systems has much in

common with

the specification of safety requirements. You cannot specify safety or

security

requirements as probabilities. Like safety requirements, security

requirements are

often “shall not” requirements that define unacceptable system behavior

rather than

required system functionality.

However, security is a more challenging problem than safety, for a

number of

reasons:

1. When considering safety, you can assume that the environment in

which the

system is installed is not hostile. No one is trying to cause a safety-related

inci-

dent. When considering security, you have to assume that attacks on the

system

are deliberate and that the attacker may have knowledge of system

weaknesses.

2. When system failures occur that pose a risk to safety, you look for the

errors or omissions that have caused the failure. When deliberate attacks

cause system

failure, finding the root cause may be more difficult as the attacker may

try to

conceal the cause of the failure.

3. It is usually acceptable to shut down a system or to degrade system

services to

avoid a safety-related failure. However, attacks on a system may be

denial-of-

service attacks, which are intended to compromise system availability.

Shutting

down the system means that the attack has been successful.

13.3 Security requirements 383

4. Safety-related events are accidental and are not created by an

intelligent adversary. An attacker can probe a system’s defenses in a series

of attacks, modifying

the attacks as he or she learns more about the system and its responses.

These distinctions mean that security requirements have to be more

extensive

than safety requirements. Safety requirements lead to the generation of

functional

system requirements that provide protection against events and faults that

could

cause safety-related failures. These requirements are mostly concerned

with check-

ing for problems and taking actions if these problems occur. By contrast,

many types of security requirements cover the different threats faced by a

system.

Firesmith (Firesmith 2003) identified 10 types of security requirements

that may

be included in a system specification:

1. Identification requirements specify whether or not a system should

identify its

users before interacting with them.

2. Authentication requirements specify how users are identified.

3. Authorization requirements specify the privileges and access

permissions of

identified users.

4. Immunity requirements specify how a system should protect itself

against

viruses, worms, and similar threats.

5. Integrity requirements specify how data corruption can be avoided.

6. Intrusion detection requirements specify what mechanisms should be

used to

detect attacks on the system.

7. Nonrepudiation requirements specify that a party in a transaction

cannot deny

its involvement in that transaction.

8. Privacy requirements specify how data privacy is to be maintained.

9. Security auditing requirements specify how system use can be audited

and

checked.

10. System maintenance security requirements specify how an application

can pre-

vent authorized changes from accidentally defeating its security

mechanisms.

Of course, you will not see all of these types of security requirements in

every

system. The particular requirements depend on the type of system, the

situation of

use, and the expected users.

Preliminary risk assessment and analysis aim to identify the generic

security risks

for a system and its associated data. This risk assessment is an important

input to the security requirements engineering process. Security

requirements can be proposed to

support the general risk management strategies of avoidance, detection

and mitigation.

1. Risk avoidance requirements set out the risks that should be avoided by

design-

ing the system so that these risks simply cannot arise.

384 Chapter 13 Security engineering

Asset

identification

Asset value

Exposure

assessment

assessment

Threat

Attack

identification

assessment

Figure 13.5 The

preliminary risk

assessment process for

Control

Feasibility

Security req.

security requirements

identification

assessment

definition

2. Risk detection requirements define mechanisms that identify the risk if

it arises and neutralize the risk before losses occur.

3. Risk mitigation requirements set out how the system should be designed

so that

it can recover from and restore system assets after some loss has occurred.

A risk-driven security requirements process is shown in Figure 13.5. The

process

stages are:

1. Asset identification, where the system assets that may require protection

are identified. The system itself or particular system functions may be

identified as

assets as well as the data associated with the system.

2. Asset value assessment, where you estimate the value of the identified

assets.

3. Exposure assessment, where you assess the potential losses associated

with each asset. This process should take into account direct losses such as

the theft

of information, the costs of recovery, and the possible loss of reputation.

4. Threat identification, where you identify the threats to system assets.

5. Attack assessment, where you decompose each threat into attacks that

might be made on the system and the possible ways in which these attacks

may occur.

You may use attack trees (Schneier 1999) to analyze the possible attacks.

These

are similar to fault trees, (Chapter 12) as you start with a threat at the root

of the tree and then identify possible causal attacks and how these might

be made.

6. Control identification, where you propose the controls that might be put

in place to protect an asset. The controls are the technical mechanisms,

such as encryption, that you can use to protect assets.

7. Feasibility assessment, where you assess the technical feasibility and the

costs of the proposed controls. It is not worth having expensive controls to

protect

assets that don’t have a high value.

13.3 Security requirements 385

Asset

Value

Exposure

The information

High. Required to support

High. Financial loss as clinics may have to

system

all clinical consultations.

be canceled. Costs of restoring system.

Potentially safety critical.

Possible patient harm if treatment cannot

be prescribed.

The patient database

High. Required to support

High. Financial loss as clinics may have to

all clinical consultations.

be canceled. Costs of restoring system.

Potentially safety critical.

Possible patient harm if treatment cannot

be prescribed.

An individual patient

Normally low, although

Low direct losses but possible loss of

record

may be high for specific

reputation.

high-profile patients

Figure 13.6 Asset

analysis in a

8. Security requirements definition, where knowledge of the exposure,

threats, and preliminary risk

control assessments is used to derive system security requirements. These

assessment report for

requirements may apply to the system infrastructure or the application

system.

the Mentcare system

The Mentcare patient management system is a security-critical system.

Figures

13.6 and 13.7 are fragments of a report that documents the risk analysis of

that software system. Figure 13.6 is an asset analysis that describes the

assets in the system and their value. Figure 13.7 shows some of the threats

that a system may face.

Once a preliminary risk assessment has been completed, then

requirements can be

proposed that aim to avoid, detect, and mitigate risks to the system.

However, creating these requirements is not a formulaic or automated

process. It requires inputs from

both engineers and domain experts to suggest requirements based on their

understand-

ing of the risk analysis and the functional requirements of the software

system. Some examples of the Mentcare system security requirements and

associated risks are:

1. Patient information shall be downloaded, at the start of a clinic session,

from

the database to a secure area on the system client.

Risk: Damage from denial-of-service attack. Maintaining local copies

means

that access is still possible.

2. All patient information on the system client shall be encrypted.

Risk:

External access to patient records. If data is encrypted, then attacker must

have access to the encryption key to discover patient information.

3. Patient information shall be uploaded to the database when a clinic

session is

over and deleted from the client computer.

Risk: External access to patience records through stolen laptop.

4. A log of all changes made to the system database and the initiator of

these

changes shall be maintained on a separate computer from the database

server.

Risk: Insider or external attacks that corrupt current data. A log should

allow up-to-date records to be re-created from a backup.

386 Chapter 13 Security engineering

Threat

Probability

Control

Feasibility

An unauthorized user

Low

Only allow system

Low cost of implementation, but care

gains access as system

management from

must be taken with key distribution

manager and makes

specific locations that

and to ensure that keys are available

system unavailable

are physically secure.

in the event of an emergency.

An unauthorized user

High

Require all users to

Technically feasible but high- cost

gains access as system

authenticate

solution. Possible user resistance.

user to confidential

themselves using a

information

biometric mechanism.

Log all changes to

Simple and transparent to

patient information to

implement and also supports

track system usage.

recovery.

Figure 13.7 Threat

and control analysis

in a preliminary risk

assessment report

The first two requirements are related—patient information is downloaded

to a

local machine, so that consultations may continue if the patient database

server is

attacked or becomes unavailable. However, this information must be

deleted so

that later users of the client computer cannot access the information. The

fourth

requirement is a recovery and auditing requirement. It means that changes

can be

recovered by replaying the change log and that it is possible to discover

who has

made the changes. This accountability discourages misuse of the system by

authorized staff.

13.3.1 Misuse cases

The derivation of security requirements from a risk analysis is a creative

process

involving engineers and domain experts. One approach that has been

developed to

support this process for users of the UML is the idea of misuse cases

(Sindre and

Opdahl 2005). Misuse cases are scenarios that represent malicious

interactions with

a system. You can use these scenarios to discuss and identify possible

threats and,

therefore also determine the system’s security requirements. They can be

used

alongside use cases when deriving the system requirements (Chapters 4

and 5).

Misuse cases are associated with use case instances and represent threats

or

attacks associated with these use cases. They may be included in a use

case diagram

but should also have a more complete and detailed textual description. In

Figure

13.8, I have taken the use cases for a medical receptionist using the

Mentcare system and have added misuse cases. These are normally

represented as black ellipses.

As with use cases, misuse cases can be described in several ways. I think

that it is most helpful to describe them as a supplement to the original use

case description. I also think it is best to have a flexible format for misuse

cases as different types of attack have to be described in different ways.

Figure 13.9 shows the original description of the Transfer Data use case

(Figure 5.4), with the addition of a misuse case description.

The problem with misuse cases mirrors the general problem of use cases,

which

is that interactions between end-users and a system do not capture all of

the system

13.3 Security requirements 387

Register

patient

Unregister

patient

Impersonate

receptionist

View patient

info.

Medical

Intercept

Attacker

receptionist

Transfer data

transfer

Contact

Figure 13.8 Misuse

patient

cases

Mentcare system: Transfer data

Actors

Medical receptionist, Patient records system (PRS)

Description

A receptionist may transfer data from the Mentcare system to a general

patient

record database that is maintained by a health authority. The information

transferred may either be updated personal information (address, phone

number, etc.) or a

summary of the patient’s diagnosis and treatment

Data

Patient’s personal information, treatment summary

Stimulus

User command issued by medical receptionist

Response

Confirmation that PRS has been updated

Comments

The receptionist must have appropriate security permissions to access the

patient

information and the PRS.

Mentcare system: Intercept transfer (Misuse case)

Actors

Medical receptionist, Patient records system (PRS), Attacker

Description

A receptionist transfers data from his or her PC to the Mentcare system on

the server.

An attacker intercepts the data transfer and takes a copy of that data.

Data (assets)

Patient’s personal information, treatment summary

Attacks

A network monitor is added to the system, and packets from the

receptionist to the

server are intercepted.

A spoof server is set up between the receptionist and the database server

so that

receptionist believes they are interacting with the real system.

Mitigations

All networking equipment must be maintained in a locked room.

Engineers accessing

the equipment must be accredited.

All data transfers between the client and server must be encrypted.

Certificate-based client–server communication must be used.

Requirements

All communications between the client and the server must use the Secure

Socket

Layer (SSL). The https protocol uses certificate-based authentication and

encryption.

Figure 13.9 Misuse case

descriptions

388 Chapter 13 Security engineering

requirements. Misuse cases can be used as part of the security

requirements engi-

neering process, but you also need to consider risks that are associated

with system stakeholders who do not interact directly with the system.

13.4 Secure systems design

It is very difficult to add security to a system after it has been

implemented. Therefore, you need to take security issues into account

during the systems design process and

make design choices that enhance the security of a system. In this section,

I focus on two application-independent issues relevant to secure systems

design:

1. Architectural design—how do architectural design decisions affect the

security of a system?

2. Good practice—what is accepted good practice when designing secure

systems?

Of course, these are not the only design issues that are important for

security.

Every application is different, and security design also has to take into

account the purpose, criticality, and operational environment of the

application. For example, if you are designing a military system, you need

to adopt their security classification model (secret, top secret, etc.) If you

are designing a system that maintains personal information, you may have

to take into account data protection legislation that places restrictions on

how data is managed.

Using redundancy and diversity, which is essential for dependability, may

mean

that a system can resist and recover from attacks that target specific

design or implementation characteristics. Mechanisms to support a high

level of availability may

help the system to recover from denial-of-service attacks, where the aim of

an

attacker is to bring down the system and stop it from working properly.

Designing a system to be secure inevitably involves compromises. It is

usually

possible to design multiple security measures into a system that will

reduce the

chances of a successful attack. However, these security measures may

require addi-

tional computation and so affect the overall performance of the system.

For example, you can reduce the chances of confidential information being

disclosed by encrypt-ing that information. However, this means that users

of the information have to wait for it to be decrypted, which may slow

down their work.

There are also tensions between security and usability—another emergent

system

property. Security measures sometimes require the user to remember and

provide

additional information (e.g., multiple passwords). However, sometimes

users forget

this information, so the additional security means that they can’t use the

system.

System designers have to find a balance between security, performance,

and usa-

bility. This depends on the type of system being developed, the

expectations of its

users, and its operational environment. For example, in a military system,

users are familiar with high-security systems and so accept and follow

processes that require

frequent checks. In a system for stock trading, where speed is essential,

interruptions of operation for security checks would be completely

unacceptable.

13.4 Secure systems design 389

Denial-of-service attacks

Denial-of-service attacks attempt to bring down a networked system by

bombarding it with a huge number of service requests, usually from

hundreds of attacking systems. These place a load on the system for which

it was not designed and they exclude legitimate requests for system

service. Consequently, the system may become unavailable either because

it crashes with the heavy load or has to be taken offline by system

managers to stop the flow of requests.

http://software-engineering-book.com/web/denial-of-service/

13.4.1 Design risk assessment

Security risk assessment during requirements engineering identifies a set

of high-

level security requirements for a system. However, as the system is

designed and

implemented, architectural and technology decisions made during the

system design

process influence the security of a system. These decisions generate new

design

requirements and may mean that existing requirements have to change.

System design and the assessment of design-related risks are interleaved

pro-

cesses (Figure 13.10). Preliminary design decisions are made, and the risks

associ-

ated with these decisions are assessed. This assessment may lead to new

requirements to mitigate the risks that have been identified or design

changes to reduce these risks.

As the system design evolves and is developed in more detail, the risks are

reas-

sessed and the results are fed back to the system designers. The design risk

assess-

ment process ends when the design is complete and the remaining risks

are acceptable.

When assessing risks during design and implementation, you have more

informa-

tion about what needs to be protected, and you also will know something

about the

vulnerabilities in the system. Some of these vulnerabilities will be inherent

in the design choices made. For example, an inherent vulnerability in

password-based

authentication is that an authorized user reveals their password to an

unauthorized

user. So, if password-based authentication is used, the risk assessment

process may

suggest new requirements to mitigate the risk. For example, there may be

a require-

ment for multifactor authentication where users must authenticate

themselves using

some personal knowledge as well as a password.

Technology

choices

System

System

Architectural

Design risk

design

requirements

design

assessment

Design assets

Design and

requirements

Figure 13.10 Interleaved

changes

design and risk

assessment

390 Chapter 13 Security engineering

Design assets

Asset value

Exposure

assessment

assessment

Threat

Attack

identification

assessment

Technology and

Control

Design and

architecture choices

identification

requirements

changes

Available

Figure 13.11 Design

controls

risk assessment

Figure 13.11 is a model of the design risk assessment process. The key

difference

between preliminary risk analysis and design risk assessment is that, at the

design

stage, you now have information about information representation and

distribution

and the database organization for the high-level assets that have to be

protected. You also know about important design decisions such as the

software to be reused, infrastructure controls and protection, and so forth.

Based on this information, your

assessment can identify changes to the security requirements and the

system design

to provide additional protection for the important system assets.

Two examples from the Mentcare system illustrate how protection

requirements

are influenced by decisions on information representation and

distribution:

1. You may make a design decision to separate personal patient

information and

information (design assets) about treatments received, with a key linking

these

records. The treatment information is technical and so much less sensitive

than

the personal patient information. If the key is protected, then an attacker

will

only be able to access routine information, without being able to link this

to an

individual patient.

2. Assume that, at the beginning of a session, a design decision is made to

copy

patient records to a local client system. This allows work to continue if the

server is unavailable. It makes it possible for a healthcare worker to access

patient records from a laptop, even if no network connection is available.

However, you now have two sets of records to protect and the client

copies are

subject to additional risks, such as theft of the laptop computer. You

therefore

have to think about what controls should be used to reduce risk. You may

there-

fore include a requirement that client records held on laptops or other

personal

computers may have to be encrypted.

13.4 Secure systems design 391

Technology choice

Vulnerabilities

Login/password

Users set

Authorized users reveal

authentication

guessable

their passwords to

passwords

unauthorized users

Server subject to

Confidential information

denial-of-service

may be left in browser

Client/server

attack

cache

architecture using

web browser

Browser security

loopholes lead to

unauthorized access

Use of editable

Fine-grain logging

Authorization can’t be

Figure 13.12

web forms

of changes is

varied according to user’s

Vulnerabilities

impossible

role

associated with

technology choices

To illustrate how decisions on development technologies influence

security,

assume that the health care provider has decided to build a Mentcare

system using an off-the-shelf information system for maintaining patient

records. This system has to be configured for each type of clinic in which

it is used. This decision has been made because it appears to offer the

most extensive functionality for the lowest development cost and fastest

deployment time.

When you develop an application by reusing an existing system, you have

to

accept the design decisions made by the developers of that system. Let us

assume

that some of these design decisions are:

1. System users are authenticated using a login name/password

combination. No

other authentication method is supported.

2. The system architecture is client–server, with clients accessing data

through a

standard web browser on a client computer.

3. Information is presented to users as an editable web form. They can

change

information in place and upload the revised information to the server.

For a generic system, these design decisions are perfectly acceptable, but

design

risk assessment shows that they have associated vulnerabilities. Examples

of these

possible vulnerabilities are shown in Figure 13.12.

Once vulnerabilities have been identified, you then have to decide what

steps you

can take to reduce the associated risks. This will often involve making

decisions

392 Chapter 13 Security engineering

about additional system security requirements or the operational process

of using the system. Examples of these requirements might be:

1. A password checker program shall be made available and shall be run

daily to

check all user passwords. User passwords that appear in the system

dictionary

shall be identified, and users with weak passwords shall be reported to

system

administrators.

2. Access to the system shall only be allowed to client computers that have

been

approved and registered with the system administrators.

3. Only one approved web browser shall be installed on client computers.

As an off-the-shelf system is used, it isn’t possible to include a password

checker

in the application system itself, so a separate system must be used.

Password check-

ers analyze the strength of user passwords when they are set up and notify

users if

they have chosen weak passwords. Therefore, vulnerable passwords can be

identi-

fied reasonably quickly after they have been set up, and action can then

be taken to ensure that users change their password.

The second and third requirements mean that all users will always access

the sys-

tem through the same browser. You can decide what is the most secure

browser

when the system is deployed and install that on all client computers.

Security updates are simplified because there is no need to update

different browsers when security

vulnerabilities are discovered and fixed.

The process model shown in Figure 13.10 assumes a design process where

the

design is developed to a fairly detailed level before implementation

begins. This is not the case for agile processes where the design and the

implementation are developed together, with the code refactored as the

design is developed. Frequent delivery of system increments does not

allow time for a detailed risk assessment, even if

information on assets and technology choices is available.

The issues surrounding security and agile development have been widely

dis-

cussed (Lane 2010; Schoenfield 2013). So far, the issue has not really been

resolved—some people think that a fundamental conflict exists between

security

and agile development, and others believe that this conflict can be

resolved using

security-focused stories (Safecode 2012). This remains an outstanding

problem

for developers of agile methods. Meanwhile, many security-conscious

companies

refuse to use agile methods because they conflict with their security and

risk

analysis policies.

13.4.2 Architectural design

Software architecture design decisions can have profound effects on the

emergent

properties of a software system. If an inappropriate architecture is used, it

may be very difficult to maintain the confidentiality and integrity of

information in the system or to guarantee a required level of system

availability.

13.4 Secure systems design 393

Platform-level protection

System

System

File integrity

authentication

authorization

management

Application-level protection

Database

Database

Transaction

Database

login

authorization

management

recovery

Record-level protection

Record access

Record

Record integrity

authorization

encryption

management

Patient records

Figure 13.13 A layered

protection architecture

In designing a system architecture that maintains security, you need to

consider

two fundamental issues:

1. Protection—how should the system be organized so that critical assets

can be protected against external attack?

2. Distribution—how should system assets be distributed so that the

consequences of a successful attack are minimized?

These issues are potentially conflicting. If you put all your assets in one

place,

then you can build layers of protection around them. As you only have to

build a

single protection system, you may be able to afford a strong system with

several protection layers. However, if that protection fails, then all your

assets are compromised.

Adding several layers of protection also affects the usability of a system, so

it may mean that it is more difficult to meet system usability and

performance requirements.

On the other hand, if you distribute assets, they are more expensive to

protect

because protection systems have to be implemented for each distributed

asset.

Typically, then, you cannot afford to implement as many protection

layers. The

chances are greater that the protection will be breached. However, if this

happens,

you don’t suffer a total loss. It may be possible to duplicate and distribute

information assets so that if one copy is corrupted or inaccessible, then the

other copy can be used. However, if the information is confidential,

keeping additional copies increases the risk that an intruder will gain

access to this information.

For the Mentcare system, a client–server architecture with a shared central

data-

base is used. To provide protection, the system has a layered architecture

with the

394 Chapter 13 Security engineering

critical protected assets at the lowest level in the system. Figure 13.13

illustrates this multilevel system architecture in which the critical assets to

be protected are the

records of individual patients.

To access and modify patient records, an attacker has to penetrate three

system layers: 1. Platform-level protection. The top level controls access to

the platform on which the patient record system runs. This usually

involves a user signing-on to

a particular computer. The platform will also normally include support for

maintaining the integrity of files on the system, backups, and so on.

2. Application-level protection. The next protection level is built into the

application itself. It involves a user accessing the application, being

authenticated, and

getting authorization to take actions such as viewing or modifying data.

Application-specific integrity management support may be available.

3. Record-level protection. This level is invoked when access to specific

records is required, and involves checking that a user is authorized to

carry out the

requested operations on that record. Protection at this level might also

involve

encryption to ensure that records cannot be browsed using a file browser.

Integrity checking using, for example, cryptographic checksums can detect

changes that have been made outside the normal record update

mechanisms.

The number of protection layers that you need in any particular

application

depends on the criticality of the data. Not all applications need protection

at the

record level, and, therefore, coarser-grain access control is more

commonly used. To achieve security, you should not allow the same user

credentials to be used at each

level. Ideally, if you have a password-based system, then the application

password

should be different from both the system password and the record-level

password.

However, multiple passwords are difficult for users to remember, and they

find

repeated requests to authenticate themselves irritating. Therefore, you

often have to compromise on security in favor of system usability.

If protection of data is a critical requirement, then a centralized client–

server

architecture is usually the most effective security architecture. The server

is responsible for protecting sensitive data. However, if the protection is

compromised, then the losses associated with an attack are high, as all

data may be lost or damaged.

Recovery costs may also be high (e.g., all user credentials may have to be

reissued).

Centralized systems are also more vulnerable to denial-of-service attacks,

which

overload the server and make it impossible for anyone to access the

system database.

If the consequences of a server breach are high, you may decide to use an

alternative distributed architecture for the application. In this situation,

the system’s assets are distributed across a number of different platforms,

with separate protection mechanisms used for each of these platforms. An

attack on one node might mean that some assets

are unavailable, but it would still be possible to provide some system

services. Data can be replicated across the nodes in the system so that

recovery from attacks is simplified.

Figure 13.14 illustrates the architecture of a banking system for trading in

stocks

and funds on the New York, London, Frankfurt, and Hong Kong markets.

The system

13.4 Secure systems design 395

Authentication and authorization

Authentication and authorization

New York trading system

London trading system

US user accounts

International

UK user accounts

International

user accounts

user accounts

US trading

US equity data

UK trading

UK equity data

history

history

International

US funds data

International

UK funds data

equity prices

equity prices

Authentication and authorization

Authentication and authorization

Frankfurt trading system

Hong Kong trading system

European user

International

HK user accounts

International

accounts

user accounts

user accounts

Euro. trading

Euro. equity data

HK trading

Asian equity data

history

history

International

Figure 13.14

Euro. funds data

International

Asian funds data

equity prices

equity prices

Distributed assets in an

equity trading system

is distributed so that data about each market is maintained separately.

Assets required to support the critical activity of equity trading (user

accounts and prices) are replicated and available on all nodes. If a node of

the system is attacked and becomes

unavailable, the critical activity of equity trading can be transferred to

another country and so can still be available to users.

I have already discussed the problem of finding a balance between

security and

system performance. A problem of secure system design is that in many

cases, the

architectural style that is best for the security requirements may not be the

best one for meeting the performance requirements. For example, say an

application has an

absolute requirement to maintain the confidentiality of a large database

and another requirement for very fast access to that data. A high-level of

protection suggests that layers of protection are required, which means

that there must be communications

between the system layers. This has an inevitable performance overhead

and so will

slow down access to the data.

If an alternative architecture is used, then implementing protection and

guaran-

teeing confidentiality may be more difficult and expensive. In such a

situation, you have to discuss the inherent conflicts with the customer

who is paying for the system and agree on how these conflicts are to be

resolved.

396 Chapter 13 Security engineering

13.4.3 Design guidelines

There are no easy ways to ensure system security. Different types of

systems require different technical measures to achieve a level of security

that is acceptable to the system owner. The attitudes and requirements of

different groups of users profoundly affect what is and is not acceptable.

For example, in a bank, users are likely to accept a higher level of

security, and hence more intrusive security procedures than, say, in a

university.

However, some general guidelines have wide applicability when designing

sys-

tem security solutions. These guidelines encapsulate good design practice

for secure systems engineering. General design guidelines for security,

such as those discussed, below, have two principal uses:

1. They help raise awareness of security issues in a software engineering

team.

Software engineers often focus on the short-term goal of getting the

software

working and delivered to customers. It is easy for them to overlook

security

issues. Knowledge of these guidelines can mean that security issues are

consid-

ered when software design decisions are made.

2. They can be used as a review checklist that can be used in the system

validation process. From the high-level guidelines discussed here, more

specific questions

can be derived that explore how security has been engineered into a

system.

Security guidelines are sometimes very general principles such as “Secure

the

weakest link in a system,” “Keep it simple,” and “Avoid security through

obscurity.”

I think these general guidelines are too vague to be of real use in the

design process.

Consequently, I have focused here on more specific design guidelines. The

10 design

guidelines, summarized in Figure 13.15, have been taken from different

sources

(Schneier 2000; Viega and McGraw 2001; Wheeler 2004).

Guideline 1: Base security decisions on an explicit security policy

An organizational security policy is a high-level statement that sets out

fundamental security conditions for an organization. It defines the “what”

of security rather than the “how.”

so the policy should not define the mechanisms to be used to provide and

enforce security.

In principle, all aspects of the security policy should be reflected in the

system requirements. In practice, especially if agile development is used,

this is unlikely to happen.

Designers should use the security policy as a framework for making and

evaluat-

ing design decisions. For example, say you are designing an access control

system

for the Mentcare system. The hospital security policy may state that only

accredited clinical staff may modify electronic patient records. This leads

to requirements to

check the accreditation of anyone attempting to modify the system and to

reject

modifications from unaccredited people.

The problem that you may face is that many organizations do not have an

explicit

systems security policy. Over time, changes may have been made to

systems in

response to identified problems, but with no overarching policy document

to guide

the evolution of a system. In such situations, you need to work out and

document the policy from examples and confirm it with managers in the

company.

13.4 Secure systems design 397

Design guidelines for security

1

Base security decisions on an explicit security policy

2

Use defense in depth

3

Fail securely

4

Balance security and usability

5

Log user actions

6

Use redundancy and diversity to reduce risk

7

Specify the format of system inputs

8

Compartmentalize your assets

Figure 13.15 Design

9

Design for deployment

guidelines for secure

systems engineering

10

Design for recovery

Guideline 2: Use defense in depth

In any critical system, it is good design practice to try to avoid a single

point of failure. That is, a single failure in part of the system should not

result in an overall systems failure. In security terms, this means that you

should not rely on a single

mechanism to ensure security; rather, you should employ several different

tech-

niques. This concept is sometimes called “defense in depth.”

An example of defense in depth is multifactor authentication. For example,

if you

use a password to authenticate users to a system, you may also include a

challenge/

response authentication mechanism where users have to pre-register

questions and

answers with the system. After they have input their login credentials,

they must

then answer questions correctly before being allowed access.

Guideline 3: Fail securely

System failures are inevitable in all systems, and, in the same way that

safety-critical systems should always fail-safe; security-critical systems

should always “fail-secure.”

When the system fails, you should not use fallback procedures that are less

secure

than the system itself. Nor should system failure mean that an attacker can

access

data that would not normally be allowed.

For example, in the Mentcare system, I suggested a requirement that

patient data

should be downloaded to a system client at the beginning of a clinic

session. This

speeds up access and means that access is possible if the server is

unavailable.

Normally, the server deletes this data at the end of the clinic session.

However, if the server has failed, then it is possible that the information

on the client will be maintained. A fail-secure approach in those

circumstances is to encrypt all patient data stored on the client. This

means that an unauthorized user cannot read the data.

Guideline 4: Balance security and usability

The demands of security and usability are often contradictory. To make a

system

secure, you have to introduce checks that users are authorized to use the

system and

398 Chapter 13 Security engineering

that they are acting in accordance with security policies. All of these

inevitably make demands on users—they may have to remember login

names and passwords, only

use the system from certain computers, and so on. These mean that it

takes users

more time to get started with the system and use it effectively. As you add

security features to a system, it usually becomes more difficult to use. I

recommend Cranor

and Garfinkel’s book (Cranor and Garfinkel 2005), which discusses a wide

range of

issues in the general area of security and usability.

There comes a point when it is counterproductive to keep adding on new

security

features at the expense of usability. For example, if you require users to

input multiple passwords or to change their passwords to impossible to

remember character

strings at frequent intervals, they will simply write down these passwords.

An

attacker (especially an insider) may then be able to find the passwords

that have been written down and gain access to the system.

Guideline 5: Log user actions

If it is practically possible to do so, you should always maintain a log of

user actions.

This log should, at least, record who did what, the assets used and the

time and date of the action. If you maintain this as a list of executable

commands, you can replay the log to recover from failures. You also need

tools that allow you to analyze the log and detect potentially anomalous

actions. These tools can scan the log and find anomalous actions, and thus

help detect attacks and trace how the attacker gained access to the system.

Apart from helping recover from failure, a log of user actions is useful

because it

acts as a deterrent to insider attacks. If people know that their actions are

being

logged, then they are less likely to do unauthorized things. This is most

effective for casual attacks, such as a nurse looking up patient records of

neighbors, or for detecting attacks where legitimate user credentials have

been stolen through social engi-

neering. Of course, this approach is not foolproof, as technically skilled

insiders may also be able to access and change the log.

Guideline 6: Use redundancy and diversity to reduce risk

Redundancy means that you maintain more than one version of software

or data in a

system. Diversity, when applied to software, means that the different

versions should not rely on the same platform or be implemented using

the same technologies.

Therefore, platform or technology vulnerabilities will not affect all

versions and so will lead to a common failure.

I have already discussed examples of redundancy—maintaining patient

informa-

tion on both the server and the client, first in the Mentcare system and

then in the distributed equity trading system shown in Figure 13.14. In the

patient records system, you could use diverse operating systems on the

client and the server (e.g., Linux on the server, Windows on the client).

This ensures that an attack based on an operating system vulnerability will

not affect both the server and the client. Of course, running multiple

operating systems leads to higher systems management costs. You

have to trade off security benefits against this increased cost.

13.4 Secure systems design 399

Guideline 7: Specify the format of system inputs

A common attack on a system involves providing the system with

unexpected inputs

that cause it to behave in an unanticipated way. These inputs may simply

cause a system crash, resulting in a loss of service, or the inputs could be

made up of malicious code that is executed by the system. Buffer overflow

vulnerabilities, first demonstrated in the Internet worm (Spafford 1989)

and commonly used by attackers, may be triggered using long input

strings. So-called SQL poisoning, where a malicious user inputs an SQL

fragment that is interpreted by a server, is another fairly common attack.

You can avoid many of these problems if you specify the format and

structure of

the system inputs that are expected. This specification should be based on

your

knowledge of the expected system inputs. For example, if a surname is to

be input,

you might specify that all characters must be alphabetic with no numbers

or punc-

tuation (apart from a hyphen) allowed. You might also limit the length of

the name.

For example, no one has a family name with more than 40 characters, and

no

addresses are more than 100 characters long. If a numeric value is

expected, no

alphabetic characters should be allowed. This information is then used in

input

checks when the system is implemented.

Guideline 8: Compartmentalize your assets

Compartmentalizing means that you should not provide users with access

to all

information in a system. Based on a general “need to know” security

principle, you

should organize the information in a system into compartments. Users

should only

have access to the information that they need for their work, rather than

to all of the information in a system. This means that the effects of an

attack that compromises

an individual user account may be contained. Some information may be

lost or dam-

aged, but it is unlikely that all of the information in the system will be

affected.

For example, the Mentcare system could be designed so that clinic staff

will nor-

mally only have access to the records of patients who have an

appointment at their

clinic. They should not normally have access to all patient records in the

system. Not only does this limit the potential loss from insider attacks, but

it also means that if an intruder steals their credentials, then they cannot

damage all patient records.

Having said this, you also may have to have mechanisms in the system to

grant

unexpected access—say to a patient who is seriously ill and requires

urgent treat-

ment without an appointment. In those circumstances, you might use

some alterna-

tive secure mechanism to override the compartmentalization in the

system. In such

situations, where security is relaxed to maintain system availability, it is

essential that you use a logging mechanism to record system usage. You

can then check the

logs to trace any unauthorized use.

Guideline 9: Design for deployment

Many security problems arise because the system is not configured

correctly when it

is deployed in its operational environment. Deployment means installing

the software

400 Chapter 13 Security engineering

on the computers where it will execute and setting software parameters to

reflect the execution environment and the preferences of the system user.

Mistakes such as

forgetting to turn off debugging facilities or forgetting to change the

default administration password can introduce vulnerabilities into a

system.

Good management practice can avoid many security problems that arise

from

configuration and deployment mistakes. However, software designers have

the

responsibility to “design for deployment.” You should always provide

support for

deployment that reduces the chances of users and system administrators

making

mistakes when configuring the software.

I recommend four ways to incorporate deployment support in a system:

1. Include support for viewing and analyzing configurations You should

always include facilities in a system that allow administrators or permitted

users to

examine the current configuration of the system.

2. Minimize default privileges You should design software so that the default

configuration of a system provides minimum essential privileges.

3. Localize configuration settings When designing system configuration

support, you should ensure that everything in a configuration that affects

the same part of

a system is set up in the same place.

4. Provide easy ways to fix security vulnerabilities You should include

straightforward mechanisms for updating the system to repair security

vulnerabilities that

have been discovered.

Deployment issues are less of a problem than they used to be as more and

more

software does not require client installation. Rather, the software runs as a

service and is accessed through a web browser. However, server software

is still vulnerable

to deployment errors and omissions, and some types of system require

dedicated

software running on the user’s computer.

Guideline 10: Design for recovery

Irrespective of how much effort you put into maintaining systems security,

you

should always design your system with the assumption that a security

failure could

occur. Therefore, you should think about how to recover from possible

failures and

restore the system to a secure operational state. For example, you may

include a

backup authentication system in case your password authentication is

compromised.

For example, say an unauthorized person from outside the clinic gains

access to

the Mentcare system and you don’t know how that person obtained a valid

login/

password combination. You need to re-initialize the authentication system

and not

just change the credentials used by the intruder. This is essential because

the intruder may also have gained access to other user passwords. You

need, therefore, to ensure

that all authorized users change their passwords. You also must ensure

that the unauthorized person does not have access to the password-

changing mechanism.

13.4 Secure systems design 401

You therefore have to design your system to deny access to everyone until

they

have changed their password and to email all users asking them to make

the

change. You need an alternative mechanism to authenticate real users for

password

change, assuming that their chosen passwords may not be secure. One

way of

doing this is to use a challenge/response mechanism, where users have to

answer

questions for which they have pre-registered answers. This is only invoked

when

passwords are changed, allowing for recovery from the attack with

relatively little

user disruption.

Designing for recoverability is an essential element of building resilience

into

systems. I cover this topic in more detail in Chapter 14.

13.4.4 Secure systems programming

Secure system design means designing security into an application system.

However,

as well as focusing on security at the design level, it is also important to

consider security when programming a software system. Many successful

attacks on software rely on

program vulnerabilities that were introduced when the program was

developed.

The first widely known attack on Internet-based systems happened in

1988 when

a worm was introduced into Unix systems across the network (Spafford

1989). This

took advantage of a well-known programming vulnerability. If systems are

pro-

grammed in C, there is no automatic array bound checking. An attacker

can include

a long string with program commands as an input, and this overwrites the

program

stack and can cause control to be transferred to malicious code. This

vulnerability

has been exploited in many other systems programmed in C or C++ since

then.

This example illustrates two important aspects of secure systems

programming:

1. Vulnerabilities are often language-specific. Array bound checking is

automatic

in languages such as Java, so this is not a vulnerability that can be

exploited in

Java programs. However, millions of programs are written in C and C++

as

these allow for the development of more efficient software. Thus. simply

avoid-

ing the use of these languages is not a realistic option.

2. Security vulnerabilities are closely related to program reliability. The

above

example caused the program concerned to crash, so actions taken to

improve

program reliability can also improve system security.

In Chapter 11, I introduced programming guidelines for dependable

system pro-

gramming. These are shown in Figure 13.16. These guidelines also help

improve

the security of a program as attackers focus on program vulnerabilities to

gain access to a system. For example, an SQL poisoning attack is based on

the attacker filling in a form with SQL commands rather than the text

expected by the system. These can

corrupt the database or release confidential information. You can

completely avoid

this problem if you implement input checks (Guideline 2) based on the

expected

format and structure of the inputs.

402 Chapter 13 Security engineering

Dependable programming guidelines

1. Limit the visibility of information in a program.

2. Check all inputs for validity.

3. Provide a handler for all exceptions.

4. Minimize the use of error-prone constructs.

5. Provide restart capabilities.

Figure 13.16

6. Check array bounds.

Dependable

7. Include timeouts when calling external components.

programming

8. Name all constants that represent real-world values.

guidelines

13.5 Security testing and assurance

The assessment of system security is increasingly important so that we can

be confi-

dent that the systems we use are secure. The verification and validation

processes for web-based systems should therefore focus on security

assessment, where the ability of the system to resist different types of

attack is tested. However, as Anderson explains (Anderson 2008), this type

of security assessment is very difficult to carry out.

Consequently, systems are often deployed with security loopholes.

Attackers use these vulnerabilities to gain access to the system or to cause

damage to the system or its data.

Fundamentally, security testing is difficult for two reasons:

1. Security requirements, like some safety requirements, are “shall not”

require-

ments. That is, they specify what should not happen rather than system

func-

tionality or required behavior. It is not usually possible to define this

unwanted

behavior as simple constraints to be checked by the system.

If resources are available, you can demonstrate, in principle at least, that a

sys-

tem meets its functional requirements. However, it is impossible to prove

that a

system does not do something. Irrespective of the amount of testing,

security

vulnerabilities may remain in a system after it has been deployed.

You may, of course, generate functional requirements that are designed to

guard

the system against some known types of attack. However, you cannot

derive

requirements for unknown or unanticipated types of attack. Even in

systems that

have been in use for many years, an ingenious attacker can discover a new

attack and can penetrate what was thought to be a secure system.

2. The people attacking a system are intelligent and are actively looking

for vul-

nerabilities that they can exploit. They are willing to experiment with the

system

and to try things that are far outside normal activity and system use. For

exam-

ple, in a surname field they may enter 1000 characters with a mixture of

letters,

punctuation, and numbers simply to see how the system responds.

Once they find a vulnerability, they publicize it and so increase the

number of

possible attackers. Internet forums have been set up to exchange

information

about system vulnerabilities. There is also a thriving market in malware

where

13.5 Security testing and assurance 403

Security checklist

1. Do all files that are created in the application have appropriate access

permissions? The wrong access permissions may lead to these files being

accessed by unauthorized users.

2. Does the system automatically terminate user sessions after a period of

inactivity? Sessions that are left active may allow unauthorized access

through an unattended computer.

3. If the system is written in a programming language without array

bound checking, are there situations where buffer overflow may be

exploited? Buffer overflow may allow attackers to send code strings to the

system and then execute them.

4. If passwords are set, does the system check that passwords are “strong”?

Strong passwords consist of mixed letters, numbers, and punctuation, and

are not normal dictionary entries. They are more difficult to break than

simple passwords.

5. Are inputs from the system’s environment always checked against an

input specification? Incorrect processing of badly formed inputs is a

common cause of security vulnerabilities.

Figure 13.17 Examples

attackers can get access to kits that help them easily develop malware

such as

of entries in a security

worms and keystroke loggers.

checklist

Attackers may try to discover the assumptions made by system developers

and

then challenge these assumptions to see what happens. They are in a

position to use

and explore a system over a period of time and analyze it using software

tools to

discover vulnerabilities that they may be able to exploit. They may, in

fact, have

more time to spend on looking for vulnerabilities than system test

engineers, as testers must also focus on testing the system.

You may use a combination of testing, tool-based analysis, and formal

verifica-

tion to check and analyze the security of an application system:

1. Experience-based testing In this case, the system is analyzed against types

of attack that are known to the validation team. This may involve

developing test

cases or examining the source code of a system. For example, to check that

the

system is not susceptible to the well-known SQL poisoning attack, you

might

test the system using inputs that include SQL commands. To check that

buffer

overflow errors will not occur, you can examine all input buffers to see if

the

program is checking that assignments to buffer elements are within

bounds.

Checklists of known security problems may be created to assist with the

process.

Figure 13.17 gives some examples of questions that might be used to drive

experience-based testing. Checks on whether design and programming

guidelines for security have been followed may also be included in a

security

problem checklist.

2. Penetration testing This is a form of experience-based testing where it is

possible to draw on experience from outside the development team to test

an application

system. The penetration testing teams are given the objective of breaching

the

system security. They simulate attacks on the system and use their

ingenuity to

discover new ways to compromise the system security. Penetration testing

team

404 Chapter 13 Security engineering

members should have previous experience with security testing and

finding

security weaknesses in systems.

3. Tool-based analysis In this approach, security tools such as password

checkers are used to analyze the system. Password checkers detect

insecure passwords

such as common names or strings of consecutive letters. This approach is

really

an extension of experience-based validation, where experience of security

flaws

is embodied in the tools used. Static analysis is, of course, another type of

tool-

based analysis, which has become increasingly used.

Tool-based static analysis (Chapter 12) is a particularly useful approach to

secu-

rity checking. A static analysis of a program can quickly guide the testing

team

to areas of a program that may include errors and vulnerabilities.

Anomalies

revealed in the static analysis can be directly fixed or can help identify

tests that need to be done to reveal whether or not these anomalies

actually represent a

risk to the system. Microsoft uses static analysis routinely to check its

software

for possible security vulnerabilities (Jenney 2013). Hewlett-Packard offers

a

tool called Fortify (Hewlett-Packard 2012) specifically designed for

checking

Java programs for security vulnerabilities.

4. Formal verification I have discussed the use of formal program

verification in Chapters 10 and 12. Essentially, this involves making

formal, mathematical

arguments that demonstrate that a program conforms to its specification.

Hall

and Chapman (Hall and Chapman 2002) demonstrated the feasibility of

proving

that a system met its formal security requirements more than 10 years

ago, and

there have been a number of other experiments since then. However, as in

other

areas, formal verification for security is not widely used. It requires

specialist

expertise and is unlikely to be as cost-effective as static analysis.

Security testing takes a long time, and, usually, the time available to the

testing

team is limited. This means that you should adopt a risk-based approach

to security

testing and focus on what you think are the most significant risks faced by

the sys-

tem. If you have an analysis of the security risks to the system, these can

be used to drive the testing process. As well as testing the system against

the security requirements derived from these risks, the test team should

also try to break the system by adopting alternative approaches that

threaten the system assets.

K e y P o i n t s

Security engineering focuses on how to develop and maintain software

systems that can resist malicious attacks intended to damage a computer-

based system or its data.

Security threats can be threats to the confidentiality, integrity, or

availability of a system or its data.

Chapter 13 Website 405

Security risk management involves assessing the losses that might ensue

from attacks on a system, and deriving security requirements that are

aimed at eliminating or reducing these losses.

To specify security requirements, you should identify the assets that are

to be protected and define how security techniques and technology should

be used to protect these assets.

Key issues when designing a secure systems architecture include

organizing the system structure to protect key assets and distributing the

system assets to minimize the losses from a successful attack.

Security design guidelines sensitize system designers to security issues

that they may not have considered. They provide a basis for creating

security review checklists.

Security validation is difficult because security requirements state what

should not happen in a system, rather than what should. Furthermore,

system attackers are intelligent and may have more time to probe for

weaknesses than is available for security testing.

F u r t h e r r e a d i n g

Security Engineering: A Guide to Building Dependable Distributed Systems, 2nd

ed. This is a thorough and comprehensive discussion of the problems of

building secure systems. The focus is on systems rather than software

engineering, with extensive coverage of hardware and networking, with

excellent examples drawn from real system failures. (R. Anderson, John

Wiley & Sons, 2008)

http://www.cl.cam.ac.uk/~rja14/book.html

24 Deadly Sins of Software Security: Programming Flaws and How to Fix

Them. I think this is one of the best practical books on secure systems

programming. The authors discuss the most common programming

vulnerabilities and describe how they can be avoided in practice. (M.

Howard, D. LeBlanc, and J. Viega, McGraw-Hill, 2009).

Computer Security: Principles and Practice. This is a good general text on

computer security issues.

It covers security technology, trusted systems, security management, and

cryptography. (W. Stallings and L. Brown, Addison-Wesley, 2012).

W e b S i t e

PowerPoint slides for this chapter:

www.pearsonglobaleditions.com/Sommerville

Links to supporting videos:

http://software-engineering-book.com/videos/security-and-resilience/

406 Chapter 13 Security engineering

e x e r C i S e S

13.1. Describe the security dimensions and security levels that have to be

considered in secure systems engineering.

13.2. For the Mentcare system, suggest an example of an asset, an

exposure, a vulnerability, an attack, a threat, and a control, in addition to

those discussed in this chapter.

13.3. Explain why security is considered a more challenging problem than

safety in a system.

13.4. Extend the table in Figure 13.7 to identify two further threats to the

Mentcare system, along with associated controls. Use these as a basis for

generating software security requirements that implement the proposed

controls.

13.5. Explain, using an analogy drawn from a non-software engineering

context, why a layered approach to asset protection should be used.

13.6. Explain why it is important to log user actions in the development of

secure systems.

13.7. For the equity trading system discussed in Section 13.4.2, whose

architecture is shown in Figure 13.14, suggest two further plausible

attacks on the system and propose possible strategies that could counter

these attacks.

13.8. Explain why it is important when writing secure systems to validate

all user inputs to check that these have the expected format.

13.9. Suggest how you would go about validating a password protection

system for an application that you have developed. Explain the function of

any tools that you think may be useful.

13.10. The Mentcare system has to be secure against attacks that might

reveal confidential patient information. Suggest three possible attacks

against this system that might occur. Using this information, extend the

checklist in Figure 13.17 to guide testers of the Mentcare system.

r e F e r e n C e S

Anderson, R. 2008. Security Engineering, 2nd ed. Chichester, UK: John

Wiley & Sons.

Cranor, L. and S. Garfinkel. 2005. Designing Secure Systems That People Can

Use. Sebastopol, CA: O’Reilly Media Inc.

Firesmith, D. G. 2003. “Engineering Security Requirements.” Journal of

Object Technology 2 (1):

53–68. http://www.jot.fm/issues/issue_2003_01/column6

Hall, A., and R. Chapman. 2002. “Correctness by Construction: Developing

a Commercially Secure System.” IEEE Software 19 (1): 18–25.

doi:10.1109/52.976937.

Hewlett-Packard. 2012. “Securing Your Enterprise Software: Hp Fortify

Code Analyzer.” http://

h20195.www2.hp.com/V2/GetDocument.aspx?

docname=4AA4-2455ENW&cc=us&lc=en

Chapter 13 References 407

Jenney, P. 2013. “Static Analysis Strategies: Success with Code Scanning.”

http://msdn.microsoft

.com/en-us/security/gg615593.aspx

Lane, A. 2010. “Agile Development and Security.” https://securosis.com/

blog/agile-development-

and-security

Pfleeger, C. P., and S. L. Pfleeger. 2007. Security in Computing, 4th ed.

Boston: Addison-Wesley.

Safecode. 2012. “Practical Security Stories and Security Tasks for Agile

Development Environ-

ments.” http://www.safecode.org/publications/

SAFECode_Agile_Dev_Security0712.pdf

Schneier, B. 1999. “Attack Trees.” Dr Dobbs Journal 24 (12): 1–9. https://

www.schneier.com/paper-

attacktrees-ddj-ft.html

. 2000. Secrets and Lies: Digital Security in a Networked World. New York:

John Wiley & Sons.

Schoenfield, B. 2013. “Agile and Security: Enemies for Life?” http://

brookschoenfield.com/?p=151

Sindre, G., and A. L. Opdahl. 2005. “Eliciting Security Requirements

through Misuse Cases.”

Requirements Engineering 10 (1): 34–44. doi:10.1007/s00766-004-0194-4.

Spafford, E. 1989. “The Internet Worm: Crisis and Aftermath.” Comm ACM

32 (6): 678–687.

doi:10.1145/63526.63527.

Stallings, W., and L. Brown. 2012. Computer Security: Principles, d Practice.

( 2nd ed.) Boston: Addison-Wesley.

Viega, J., and G. McGraw. 2001. Building Secure Software. Boston: Addison-

Wesley.

Wheeler, D. A. 2004. Secure Programming for Linux and Unix. Self-

published. http://www.dwheeler

.com/secure-programs/

14

Resilience engineering

Objectives

The objective of this chapter is to introduce the idea of resilience

engineering where systems are designed to withstand adverse

external events such as operator errors and cyberattacks. When you

have read this chapter, you will:

understand the differences between resilience, reliability, and

security and why resilience is important for networked systems;

be aware of the fundamental issues in building resilient systems,

namely, recognition of problems, resistance to failures and

attacks, recovery of critical services, and system reinstatement;

understand why resilience is a sociotechnical rather than a

technical issue and the role of system operators and managers in

providing resilience;

have been introduced to a system design method that supports

resilience.

Contents

14.1 Cybersecurity

14.2 Sociotechnical resilience

14.3 Resilient systems design

Chapter 14 Resilience engineering 409

In April 1970, the Apollo 13 manned mission to the moon suffered a

catastrophic

failure. An oxygen tank exploded in space, resulting in a serious loss of

atmospheric oxygen and oxygen for the fuel cells that powered the

spacecraft. The situation was

life threatening, with no possibility of rescue. There were no contingency

plans for this situation. However, by using equipment in unintended ways

and by adapting

standard procedures, the combined efforts of the spacecraft crew and

ground staff

worked around the problems. The spacecraft was brought back to earth

safely, and

all the crew survived. The overall system (people, equipment, and

processes) was

resilient. It adapted to cope with and recover from the failure.

I introduced the idea of resilience in Chapter 10, as one of the

fundamental

attributes of system dependability. I defined resilience in Chapter 10 as:

The resilience of a system is a judgment of how well that system can maintain

the continuity of its critical services in the presence of disruptive events, such as

equipment failure and cyberattacks.

This is not a “standard” definition of resilience—different authors such as

Laprie

(Laprie 2008) and Hollnagel (Hollnagel 2006) propose general definitions

based on

the ability of a system to withstand change. That is, a resilient system is

one that can operate successfully when some of the fundamental

assumptions made by the system

designers no longer hold.

For example, an initial design assumption may be that users will make

mistakes

but will not deliberately seek out system vulnerabilities to be exploited. If

the system is used in an environment where it may be subject to

cyberattacks, this is no longer true. A resilient system can cope with the

environmental change and can continue to

operate successfully.

While these definitions are more general, my definition of resilience is

closer to

how the term is now used in practice by governments and industry. It

embeds three

essential ideas:

1. The idea that some of the services offered by a system are critical

services

whose failure could have serious human, social, or economic effects.

2. The idea that some events are disruptive and can affect the ability of a

system to deliver its critical services.

3. The idea that resilience is a judgment—there are no resilience metrics,

and

resilience cannot be measured. The resilience of a system can only be

assessed

by experts, who can examine the system and its operational processes.

Fundamental work on system resilience started in the safety-critical

systems

community, where the aim was to understand what factors led to

accidents being avoided and survived. However, the increasing number of

cyberattacks on networked systems has meant that resilience is now often

seen as a security issue. It is essential to build systems that can withstand

malicious cyberattacks and continue to deliver services to their users.

410 Chapter 14 Resilience engineering

Obviously, resilience engineering is closely related to reliability and

security

engineering. The aim of reliability engineering is to ensure that systems do

not fail.

A system failure is an externally observable event, which is often a

consequence of

a fault in the system. Therefore, techniques such as fault avoidance and

fault tolerance, as discussed in Chapter 11, have been developed to reduce

the number of sys-

tem faults and to trap faults before they lead to system failure.

In spite of our best efforts, faults will always be present in a large,

complex sys-

tem, and they may lead to system failure. Delivery schedules are short,

and testing

budgets are limited. Development teams are working under pressure, and

it is practi-

cally impossible to detect all of the faults and security vulnerabilities in a

software system. We are building systems that are so complex (see Chapter

19) that we cannot

possibly understand all of the interactions between the system

components. Some of

these interactions may be a trigger for overall system failure.

Resilience engineering does not focus on avoiding failure but rather on

accepting

the reality that failures will occur. It makes two important assumptions:

1. Resilience engineering assumes that it is impossible to avoid system

failures and so is concerned with limiting the costs of these failures and

recovering from them.

2. Resilience engineering assumes that good reliability engineering

practices have

been used to minimize the number of technical faults in a system. It

therefore

places more emphasis on limiting the number of system failures that arise

from

external events such as operator errors or cyberattacks.

In practice, technical system failures are often triggered by events that are

external to the system. These events may involve operator actions or user

errors that are unexpected. Over the last few years, however, as the

number of networked systems has

increased, these events have often been cyberattacks. In a cyberattack, a

malicious person or group tries to damage the system or to steal

confidential information. These are now more significant than user or

operator errors as a potential source of system failure.

Because of the assumption that failures will inevitably occur, resilience

engineer-

ing is concerned with both the immediate recovery from failure to

maintain critical

services and the longer-term reinstatement of all system services. As I

discuss in

Section 14.3, this means that system designers have to include system

features to

maintain the state of the system’s software and data. In the event of a

failure, essential information may then be restored.

Four related resilience activities are involved in the detection of and

recovery

from system problems:

1. Recognition The system or its operators should be able to recognize the

symptoms of a problem that may lead to system failure. Ideally, this

recognition

should be possible before the failure occurs.

2. Resistance If the symptoms of a problem or signs of a cyberattack are

detected early, then resistance strategies may be invoked that reduce the

probability that the system will fail. These resistance strategies may focus

on isolating critical parts of the system so that they are unaffected by

problems elsewhere. Resistance includes

Chapter 14 Resilience engineering 411

Recognition

Resistance

Recovery

Reinstatement

Normal operating

Critical service

Critical service

Restricted service

state

delivery

delivery

delivery

Attack

Attack

System

Software and data

recognition

resistance

repair

restoration

Attack

Attack

Repair

detected

successful

complete

Attack repelled

Reinstatement complete

Figure 14.1 Resilience

activities

proactive resistance where defenses are included in a system to trap

problems and

reactive resistance where actions are taken when a problem is discovered.

3. Recovery If a failure occurs, the aim of the recovery activity is to ensure

that critical system services are restored quickly so that system users are

not seriously affected by the failure.

4. Reinstatement In this final activity, all of the system services are

restored, and normal system operation can continue.

These activities lead to changes to the system state as shown in Figure

14.1,

which shows the state changes in the system in the event of a cyberattack.

In parallel with normal system operation, the system monitors network

traffic for possible

cyberattacks. In the event of a cyberattack, the system moves to a

resistance state in which normal services may be restricted.

If resistance successfully repels the attack, normal service is resumed.

Otherwise, the system moves to a recovery state where only critical

services are available. Repairs to the damage caused by the cyberattack

are carried out. Finally, when repairs are complete, the system moves to a

reinstatement state. In this state, the system’s services are incrementally

restored. Finally, when all restoration is complete, normal service is

resumed.

As the Apollo 13 example illustrates, resilience cannot be “programmed

in” to a

system. It is impossible to anticipate everything that might go wrong and

every con-

text where problems might arise. The key to resilience, therefore, is

flexibility and adaptability. As I discuss in Section 14.2, it should be

possible for system operators and managers to take actions to protect and

repair the system, even if these actions are abnormal or are normally

disallowed.

Increasing the resilience of a system of course has significant costs.

Software

may have to be purchased or modified, and additional investments made

in hardware

or cloud services to provide backup systems that can be used in the event

of a system failure. The benefits from these costs are impossible to

calculate because the losses from a failure or attack can only be calculated

after the event.

Companies may therefore be reluctant to invest in resilience if they have

never

suffered a serious attack or associated loss. However, the increasing

number of

412 Chapter 14 Resilience engineering

high-profile cyberattacks that have damaged business and government

systems have

increased awareness of the need for resilience. It is clear that losses can be

very

significant, and sometimes businesses may not survive a successful

cyberattack.

Therefore, there is increasing investment in resilience engineering to

reduce the

business risks associated with system failure.

14.1 Cybersecurity

Maintaining the security of our networked infrastructure and government,

business,

and personal computer systems is one of the most significant problems

facing our

society. The ubiquity of the Internet and our dependence on computer

systems have

created new criminal opportunities for theft and social disruption. It is

very difficult to measure the losses due to cybercrime. However, in 2013,

it was estimated that

losses to the global economy due to cybercrime were between $100 billion

and $500

billion (InfoSecurity 2013).

As I suggested in Chapter 13, cybersecurity is a broader issue than system

security engineering. Software security engineering is a primarily

technical activity that focuses on techniques and technologies to ensure

that application systems are secure.

Cybersecurity is a sociotechnical concern. It covers all aspects of ensuring

the protection of citizens, businesses, and critical infrastructures from

threats that arise from their use of computers and the Internet. While

technical issues are important, technology on its own cannot guarantee

security. Factors that contribute to cybersecurity failures include:

organizational ignorance of the seriousness of the problem,

poor design and lax application of security procedures,

human carelessness, and

inappropriate trade-offs between usability and security.

Cybersecurity is concerned with all of an organization’s IT assets from

networks

through to application systems. The vast majority of these assets are

externally procured, and companies do not understand their detailed

operation. Systems such as web browsers are large and complex programs,

and inevitably they contain bugs that can be a

source of vulnerability. The different systems in an organization are

related to each other in many different ways. They may be stored on the

same disk, share data, rely on common operating systems components,

and so on. The organizational “system of systems”

is incredibly complex. It is impossible to ensure that it is free of security

vulnerabilities.

Consequently, you should generally assume that your systems are

vulnerable to

cyberattack and that, at some stage, a cyberattack is likely to occur. A

successful

cyberattack can have very serious financial consequences for businesses,

so it is

essential that attacks are contained and losses minimized. Effective

resilience engineering at the organizational and systems levels can repel

attacks and bring systems back into operation quickly and so limit the

losses incurred.

14.1 Cybersecurity 413

In Chapter 13, where I discussed security engineering, I introduced

concepts that

are fundamental to resilience planning. Some of these concepts are:

1. Assets, which are systems and data that have to be protected. Some

assets are more valuable than others and so require a higher level of

protection.

2. Threats, which are circumstances that can cause harm by damaging or

stealing organizational IT infrastructure or system assets.

3. Attacks, which are manifestations of a threat where an attacker aims to

damage or steal IT assets, such as websites or personal data.

Three types of threats have to be considered in resilience planning:

1. Threats to the confidentiality of assets In this case, data is not damaged,

but it is made available to people who should not have access to it. An

example of a

threat to confidentiality is when a credit card database held by a company

is

stolen, with the potential for illegal use of card information.

2. Threats to the integrity of assets These are threats where systems or data

are damaged in some way by a cyberattack. This may involve introducing

a virus or

a worm into software or corrupting organizational databases.

3. Threats to the availability of assets These are threats that aim to deny use

of assets by authorized users. The best-known example is a denial-of-

service attack

that aims to take down a website and so make it unavailable for external

use.

These are not independent threat classes. An attacker may compromise the

integ-

rity of a user’s system by introducing malware, such as a botnet

component. This

may then be invoked remotely as part of a distributed denial-of-service

attack on

another system. Other types of malware may be used to capture personal

details and

so allow confidential assets to be accessed.

To counter these threats, organizations should put controls in place that

make it

difficult for attackers to access or damage assets. It is also important to

raise awareness of cybersecurity issues so that people know why these

controls are important

and so are less likely to reveal information to an attacker.

Examples of controls that may be used are:

1. Authentication, where users of a system have to show that they are

authorized to access the system. The familiar login/password approach to

authentication is a

universally used but rather weak control.

2. Encryption, where data is algorithmically scrambled so that an

unauthorized reader cannot access the information. Many companies now

require that laptop

disks are encrypted. If the computer is lost or stolen, this reduces the

likelihood

that the confidentiality of the information will be breached.

3. Firewalls, where incoming network packets are examined, then accepted

or rejected according to a set of organizational rules. Firewalls can be used

to

414 Chapter 14 Resilience engineering

ensure that only traffic from trusted sources is allowed to pass from the

external

Internet into the local organizational network.

A set of controls in an organization provides a layered protection system.

An

attacker has to get through all of the protection layers for the attack to

succeed.

However, there is a trade-off between protection and efficiency. As the

number of

layers of protection increases, the system slows down. The protection

systems con-

sume an increasing amount of memory and processor resources, leaving

less availa-

ble to do useful work. The more security, the more inconvenient it is for

users and

the more likely that they will adopt insecure practices to increase system

usability.

As with other aspects of system dependability, the fundamental means of

protect-

ing against cyberattacks depends on redundancy and diversity. Recall that

redun-

dancy means having spare capacity and duplicated resources in a system.

Diversity

means that different types of equipment, software, and procedures are

used so that

common failures are less likely to occur across a number of systems.

Examples of

where redundancy and diversity are valuable for cyber-resilience are:

1. For each system, copies of data and software should be maintained on

separate

computer systems. Shared disks should be avoided if possible. This

supports

recovery after a successful cyberattack (recovery and reinstatement).

2. Multi-stage diverse authentication can protect against password attacks.

As well as login/password authentication, additional authentication steps

may be

involved that require users to provide some personal information or a

code gen-

erated by their mobile device (resistance).

3. Critical servers may be overprovisioned; that is, they may be more

powerful

than is required to handle their expected load. The spare capacity means

that

attacks may be resisted without necessarily degrading the normal response

of

the server. Furthermore, if other servers are damaged, spare capacity is

available

to run their software while they are being repaired (resistance and

recovery).

Planning for cybersecurity has to be based on assets and controls and the 4

Rs of resilience engineering—recognition, resistance, recovery, and

reinstatement. Figure 14.2

shows a planning process that may be followed. The key stages in this

process are:

1. Asset classification The organization’s hardware, software, and human

assets are examined and classified depending on how essential they are to

normal

operations. They may be classed as critical, important, or useful.

2. Threat identification For each of the assets (or at least the critical and

important assets), you should identify and classify threats to that asset. In

some cases, you

may try to estimate the probability that a threat will arise, but such

estimates are often inaccurate as you don’t have enough information about

potential attackers.

3. Threat recognition For each threat or, sometimes asset/threat pair, you

should identify how an attack based on that threat might be recognized.

You may

14.1 Cybersecurity 415

Threat

recognition

Interface

Asset

Integration and

Threat

Interface

Asset

Integration and

Asset

development

classification

deployment

identification

development

recovery

deployment

reinstatement

Threat

resistance

Cyber-resilience plan

Figure 14.2 Cyber-

decide that additional software needs to be bought or written for threat

recogni-

resilience planning

tion or that regular checking procedures are put in place.

4. Threat resistance For each threat or asset/threat pair, you should identify

possible resistance strategies. These either may be embedded in the system

(technical

strategies) or may rely on operational procedures. You may also need to

think of

threat neutralization strategies so that the threat does not recur.

5. Asset recovery For each critical asset or asset/threat pair, you should

work out how that asset could be recovered in the event of a successful

cyberattack. This

may involve making extra hardware available or changing backup

procedures to

make it easier to access redundant copies of data.

6. Asset reinstatement This is a more general process of asset recovery

where you define procedures to bring the system back into normal

operation. Asset reinstatement should be concerned with all assets and not

simply assets that are

critical to the organization.

Information about all of these stages should be maintained in a cyber-

resilience

plan. This plan should be regularly updated, and, wherever possible, the

strategies

identified should be tested in mock attacks on the system.

Another important part of cyber-resilience planning is to decide how to

support a flexible response in the event of a cyberattack. Paradoxically,

resilience and security

requirements often conflict. The aim of security is usually to limit

privilege as far as possible so that users can only do what the security

policy of the organization allows.

However, to deal with problems, a user or system operator may have to

take the initiative and take actions that are normally carried out by

someone with a higher level of privilege.

For example, the system manager of a medical system may not normally

be

allowed to change the access rights of medical staff to records. For

security reasons, access permissions have to be formally authorized, and

two people need to be

involved in making the change. This reduces the chances of system

managers col-

luding with attackers and allowing access to confidential medical

information.

Now, imagine that the system manager notices that a logged-in user is

accessing

a large number of records outside of normal working hours. The manager

suspects

416 Chapter 14 Resilience engineering

that an account has been compromised and that the user accessing the

records is not

actually the authorized user. To limit the damage, the user’s access rights

should be removed and a check then made with the authorized user to see

if the accesses were

actually illegal. However, the security procedures limiting the rights of

system managers to change users’ permissions make this impossible.

Resilience planning should take such situations into account. One way of

doing

so is to include an “emergency” mode in systems where normal checks are

ignored.

Rather than forbidding operations, the system logs what has been done

and who was

responsible. Therefore, the audit trail of emergency actions can be used to

check that a system manager’s actions were justified. Of course, there is

scope for misuse here, and the existence of an emergency mode is itself a

potential vulnerability. Therefore, organizations have to trade off possible

losses against the benefits of adding more

features to a system to support resilience.

14.2 Sociotechnical resilience

Fundamentally, resilience engineering is a sociotechnical rather than a

technical

activity. As I explained in Chapter 10, a sociotechnical system includes

hardware,

software, and people and is influenced by the culture, policies, and

procedures of the organization that owns and uses the system. To design a

resilient system, you have to think about sociotechnical systems design

and not exclusively focus on software.

Resilience engineering is concerned with adverse external events that can

lead to

system failure. Dealing with these events is often easier and more effective

in the

broader sociotechnical system.

For example, the Mentcare system maintains confidential patient data, and

a possible external cyberattack may aim to steal that data. Technical

safeguards such as authentication and encryption may be used to protect

the data, but these are not effective if an attacker has access to the

credentials of a genuine system user. You could try to solve this problem

at the technical level by using more complex authentication procedures.

However, these procedures annoy users and may lead to vulnerabilities as

they write

down authentication information. A better strategy may be to introduce

organizational policies and procedures that emphasize the importance of

not sharing login credentials and that tell users about easy ways to create

and maintain strong passwords.

Resilient systems are flexible and adaptable so that they can cope with the

unex-

pected. It is very difficult to create software that can adapt to cope with

problems that have not been anticipated. However, as we saw from the

Apollo 13 accident,

people are very good at this. Therefore, to achieve resilience, you should

take advantage of the fact that people are an inherent part of

sociotechnical systems. Rather

than try to anticipate and deal with all problems in software, you should

leave some types of problem solving to the people responsible for

operating and managing the

software system.

To understand why you should leave some types of problem solving to

people,

you have to consider the hierarchy of sociotechnical systems that includes

technical, software-intensive systems. Figure 14.3 shows that technical

systems S1 and S2 are

14.2 Sociotechnical resilience 417

Failure

Managers

Failure

Operators

S1

S2

ST1

Figure 14.3 Nested

technical and

Organization

sociotechnical systems

part of a broader sociotechnical system ST1. That sociotechnical system

includes

operators who monitor the condition of S1 and S2 and who can take

actions to

resolve problems in these systems. If system S1 (say) fails, then the

operators in

ST1 may detect that failure and take recovery actions before the software

failure

leads to failure in the broader sociotechnical system. Operators may also

invoke

recovery and reinstatement procedures to get S1 back to its normal

operating state.

Operational and management processes are the interface between the

organiza-

tion and the technical systems that are used. If these processes are well

designed,

they allow people to discover and to cope with technical system failures,

as well as ensuring that operator errors are minimized. As I discuss in

Section 14.2.2, rigid

processes that are overautomated are not inherently resilient. They do not

allow people to use their skills and knowledge to adapt and change

processes to cope with the unexpected and deal with unanticipated

failures.

The system ST1 is one of a number of sociotechnical systems in the

organization.

If the system operators cannot contain a technical system failure, then this

may lead to a failure in the sociotechnical system ST1. Managers at the

organizational level

then must detect the problem and take steps to recover from it. Resilience

is there-

fore an organizational as well as a system characteristic.

Hollnagel (Hollnagel 2010), who was an early advocate of resilience

engineer-

ing, argues that it is important for organizations to study and learn from

successes as well as failure. High-profile safety and security failures lead to

inquiries and changes in practice and procedures. However, rather than

respond to these failures, it is

better to avoid them by observing how people deal with problems and

maintain

resilience. This good practice can then be disseminated throughout the

organization.

Figure 14.4 shows four characteristics that Hollnagel suggests reflect the

resilience of an organization. These characteristics are:

1. The ability to respond Organizations have to be able to adapt their

processes and procedures in response to risks. These risks may be

anticipated risks, or they

may be detected threats to the organization and its systems. For example,

if a

new security threat is detected and publicized, a resilient organization can

make

changes quickly so that this threat does not disrupt its operations.

2. The ability to monitor Organizations should monitor both their internal

operations and their external environment for threats before they arise.

For

example, a company should monitor how its employees follow security

policies.

418 Chapter 14 Resilience engineering

Monitoring the

Responding to threats

Anticipating future

organization and

and vulnerabilities

threats and

environment

opportunities

Figure 14.4

Learning from experience

Characteristics of

resilient organizations

If potentially insecure behavior is detected, the company should respond

by taking

actions to understand why this has occurred and to change employee

behavior.

3. The ability to anticipate A resilient organization should not simply focus

on its current operations but should anticipate possible future events and

changes that

may affect its operations and resilience. These events may include

technological

innovations, changes in regulations or laws, and modifications in customer

behavior. For example, wearable technology is starting to become

available,

and companies should now be thinking about how this might affect their

current

security policies and procedures.

4. The ability to learn Organizational resilience can be improved by

learning from experience. It is particularly important to learn from

successful responses to

adverse events such as the effective resistance of a cyberattack. Learning

from

success allows good practice to be disseminated throughout the

organization.

As Hollnagel says, to become resilient organizations have to address all of

these

issues to some extent. Some will focus more on one quality than others.

For exam-

ple, a company running a large-scale data center may focus mostly on

monitoring

and responsiveness. However, a digital library that manages long-term

archival

information may have to anticipate how future changes may affect its

business as

well as respond to any immediate security threats.

14.2.1 Human error

Early work on resilience engineering was concerned with accidents in

safety-

critical systems and with how the behavior of human operators could lead

to safety-

related system failures. This led to an understanding of system defenses

that is

equally applicable to systems that have to withstand malicious as well as

accidental human actions.

We know that people make mistakes, and, unless a system is completely

automated,

it is inevitable that users and system operators will sometimes do the

wrong thing.

Unfortunately, these human errors sometimes lead to serious system

failures. Reason (Reason, 2000) suggests that the problem of human error

can be viewed in two ways:

1. The person approach Errors are considered to be the responsibility of the

individual and “unsafe acts” (such as an operator failing to engage a safety

barrier)

14.2 Sociotechnical resilience 419

are a consequence of individual carelessness or reckless behavior. People

who

adopt this approach believe that human errors can be reduced by threats

of

disciplinary action, more stringent procedures, retraining, and so on. Their

view

is that the error is the fault of the individual responsible for making the

mistake.

2. The systems approach The basic assumption is that people are fallible

and will make mistakes. People make mistakes because they are under

pressure from high workloads, because of poor training, or because of

inappropriate system design. Good

systems should recognize the possibility of human error and include

barriers and

safeguards that detect human errors and allow the system to recover

before failure

occurs. When a failure does occur, the best way to avoid its recurrence is

to understand how and why the system defenses did not trap the error.

Blaming and punishing the

person who triggered the failure does not improve long-term system

safety.

I believe that the systems approach is the right one and that systems

engineers

should assume that human errors will occur during system operation.

Therefore, to

improve the resilience of a system, designers have to think about the

defenses and

barriers to human error that could be part of a system. They should also

think about whether these barriers should be built into the technical

components of the system.

If not, they could be part of the processes, procedures, and guidelines for

using the system. For example, two operators may be required to check

critical system inputs.

The barriers and safeguards that protect against human errors may be

technical or

sociotechnical. For example, code to validate all inputs is a technical

defense; an approval procedure for critical system updates that needs two

people to confirm the update is a sociotechnical defense. Using diverse

barriers means that shared vulnerabilities are less likely and that a user

error is more likely to be trapped before system failure.

In general, you should use redundancy and diversity to create a set of

defensive

layers (Figure 14.5), where each layer uses a different approach to deter

attackers or to trap component failures or human errors. Dark blue

barriers are software checks;

light blue barriers are checks carried out by people.

As an example of this approach to defense in depth, some of the checks for

con-

troller errors that may be part of an air traffic control system include:

1. A conflict alert warning as part of an air traffic control system When a

controller instructs an aircraft to change its speed or altitude, the system

extrapolates its trajectory to see if it intersects with any other aircraft. If

so, it sounds an alarm.

2. Formalized recording procedures for air traffic management The same ATC

system may have a clearly defined procedure setting out how to record the

con-

trol instructions that have been issued to aircraft. These procedures help

control-

lers check if they have issued the instruction correctly and make the

information

visible to others for checking.

3. Collaborative checking Air traffic control involves a team of controllers

who constantly monitor each other’s work. When a controller makes a

mistake, others usually detect and correct it before an incident occurs.

420 Chapter 14 Resilience engineering

Technical defenses

Errors or

attacks

Figure 14.5 Defensive

layers

Sociotechnical defenses

Reason (Reason 2000) draws on the idea of defensive layers in a theory of

how

human errors lead to system failures. He introduces the so-called Swiss

cheese

model, which suggests that defensive layers are not solid barriers but are

instead like slices of Swiss cheese. Some types of Swiss cheese, such as

Emmenthal, have holes

of varying sizes in them. Reason suggests that vulnerabilities, or what he

calls latent conditions in the layers, are analogous to these holes.

These latent conditions are not static—they change depending on the state

of the

system and the people involved in system operation. To continue with the

analogy,

the holes change size and move around the defensive layers during system

operation.

For example, if a system relies on operators checking each other’s work, a

possible vulnerability is that both make the same mistake. This is unlikely

under normal conditions so, in the Swiss cheese model, the hole is small.

However, when the system

is heavily loaded and the workload of both operators is high, then

mistakes are more likely. The size of the hole representing this

vulnerability increases.

Failure in a system with layered defenses occurs when there is some

external trig-

ger event that has the potential to cause damage. This event might be a

human error

(which Reason calls an active failure) or it could be a cyberattack. If all of

the defensive barriers fail, then the system as a whole will fail.

Conceptually, this corresponds to the holes in the Swiss cheese slices

lining up, as shown in Figure 14.6.

This model suggests that different strategies can be used to increase

system resil-

ience to adverse external events:

1. Reduce the probability of the occurrence of an external event that

might trigger system failures. To reduce human errors, you may introduce

improved training for

operators or give operators more control over their workload so that they

are not

overloaded. To reduce cyberattacks, you may reduce the number of people

who have

privileged system information and so reduce the chances of disclosure to

an attacker.

2. Increase the number of defensive layers. As a general rule, the more

layers that you have in a system, the less likely it is that the holes will line

up and a system failure will occur. However, if these layers are not

independent, then they may

share a common vulnerability. Thus, the barriers are likely to have the

same

“hole” in the same place, so there is only a limited benefit in adding a new

layer.

14.2 Sociotechnical resilience 421

Active failure

(Human error)

Figure 14.6 Reason’s

Swiss cheese model of

system failure

System failure

Latent conditions in defensive layers

3. Design a system so that diverse types of barriers are included. This

means that

the “holes” will probably be in different places, and so there is less chance

of the holes lining up and failing to trap an error.

4. Minimize the number of latent conditions in a system. Effectively, this

means

reducing the number and size of system “holes.” However, this may

significantly

increase systems engineering costs. Reducing the number of bugs in the

system

increases testing and V & V costs. Therefore, this option may not be cost-

effective.

In designing a system, you need to consider all of these options and make

choices

about what might be the most cost-effective ways to improve the system’s

defenses.

If you are building custom software, then using software checking to

increase the

number and diversity of layers may be the best option. However, if you

are using

off-the-shelf software, then you may have to consider how sociotechnical

defenses

may be added. You may decide to change training procedures to reduce

the chances

of problems occurring and to make it easier to deal with incidents when

they arise.

14.2.2 Operational and management processes

All software systems have associated operational processes that reflect the

assump-

tions of the designers about how these systems will be used. Some

software systems,

particularly those that control or are interfaced to special equipment, have

trained operators who are an intrinsic part of the control system. Decisions

are made during the design stage about which functions should be part of

the technical system and

which functions should be the operator’s responsibility. For example, in an

imaging

system in a hospital, the operator may have the responsibility of checking

the quality of the images immediately after they have been processed. This

check allows the

imaging procedure to be repeated if there is a problem.

Operational processes are the processes that are involved in using the

system for

its defined purpose. For example, operators of an air traffic control system

follow

specific processes when aircraft enter and leave airspace, when they have

to change

height or speed, when an emergency occurs, and so on. For new systems,

these oper-

ational processes have to be defined and documented during the system

develop-

ment process. Operators may have to be trained and other work processes

adapted to

make effective use of the new system.

422 Chapter 14 Resilience engineering

Most software systems, however, do not have trained operators but have

system

users, who use the system as part of their work or to support their

personal interests.

For personal systems, the designers may describe the expected use of the

system but

have no control over how users will actually behave. For enterprise IT

systems, how-

ever, training may be provided for users to teach them how to use the

system.

Although user behavior cannot be controlled, it is reasonable to expect

that they will normally follow the defined process.

Enterprise IT systems will also usually have system administrators or

managers

who are responsible for maintaining that system. While they are not part

of the business process supported by the system, their job is to monitor the

software system for errors and problems. If problems arise, system

managers take action to resolve them

and restore the system to its normal operational state.

In the previous section, I discussed the importance of defense in depth and

the use

of diverse mechanisms to check for adverse events that could lead to

system failure.

Operational and management processes are an important defense

mechanism, and,

in designing a process, you need to find a balance between efficient

operation and

problem management. These are often in conflict as shown in Figure 14.7

as increas-

ing efficiency removes redundancy and diversity from a system.

Over the past 25 years, businesses have focused on so-called process

improve-

ment. To improve the efficiency of operational and management

processes, compa-

nies study how their processes are enacted and look for particularly

efficient and

inefficient practice. Efficient practice is codified and documented, and

software may be developed to support this “optimum” process. Inefficient

practice is replaced by

more efficient ways of doing things. Sometimes process control

mechanisms are

introduced to ensure that system operators and managers follow this “best

practice.”

The problem with process improvement is that it often makes it harder for

people to

cope with problems. What seems to be “inefficient” practice often arises

because people maintain redundant information or share information

because they know this makes it

easier to deal with problems when things go wrong. For example, air

traffic controllers may print flight details as well as rely on the flight

database because they will then have information about flights in the air if

the system database becomes unavailable.

People have a unique capability to respond effectively to unexpected

situations,

even when they have never had direct experience of these situations.

Therefore,

when things go wrong, operators and system managers can often recover

the situa-

tion, although they may sometimes have to break rules and “work around”

the

defined process. You should therefore design operational processes to be

flexible

and adaptable. The operational processes should not be too constraining;

they should not require operations to be done in a particular order; and

the system software

should not rely on a specific process being followed.

For example, an emergency service control room system is used to manage

emer-

gency calls and to initiate a response to these calls. The “normal” process

of han-

dling a call is to log the caller’s details and then send a message to the

appropriate emergency service giving details of the incident and the

address. This procedure

provides an audit trail of the actions taken. A subsequent investigation can

check

that the emergency call has been properly handled.

14.2 Sociotechnical resilience 423

Efficient process operation

Problem management

Process optimization and control

Process flexibility and adaptability

Information hiding and security

Information sharing and visibility

Automation to reduce operator workload with fewer

Manual processes and spare operator/manager

operators and managers

capacity to deal with problems

Role specialization

Role sharing

Figure 14.7 Efficiency

Now imagine that this system is subject to a denial-of-service attack,

which makes

and resilience

the messaging system unavailable. Rather than simply not responding to

calls, the operators may use their personal mobile phones and their

knowledge of call responders to call the emergency service units directly

so that they can respond to serious incidents.

Management and provision of information are also important for resilient

operation.

To make a process more efficient, it may make sense to present operators

with the

information they need, when they need it. From a security perspective,

information

should not be accessible unless the operator or manager needs that

information.

However, a more liberal approach to information access can improve

system resilience.

If operators are only presented with information that the process designer

thinks

they “need to know,” then they may be unable to detect problems that do

not directly affect their immediate tasks. When things go wrong, the

system operators do not

have a broad picture of what is happening in the system, so it is more

difficult for them to formulate strategies for dealing with problems. If they

cannot access some

information in the system for security reasons, then they may be unable to

stop

attacks and repair the damage that has been caused.

Automating the system management process means that a single manager

may be

able to manage a large number of systems. Automated systems can detect

common

problems and take actions to recover from these problems. Fewer people

are needed

for system operations and management, and so costs are reduced.

However, process

automation has two disadvantages:

1. Automated management systems may go wrong and take incorrect

actions. As

problems develop, the system may take unexpected actions that make the

situa-

tion worse and that cannot be understood by the system managers.

2. Problem solving is a collaborative process. If fewer managers are

available, it is likely to take longer to work out a strategy to recover from

a problem or cyberattack.

Therefore, process automation can have both positive and negative effects

on

system resilience. If the automated system works properly, it can detect

problems,

invoke cyberattack resistance if necessary, and start automated recovery

procedures.

However, if the automated system can’t handle the problem, fewer people

will be

available to tackle the problem and the system may have been damaged

by the pro-

cess automation doing the wrong thing.

In an environment where there are different types of system and

equipment, it

may be impractical to expect all operators and managers to be able to deal

with all of

424 Chapter 14 Resilience engineering

the different systems. Individuals may therefore specialize so that they

become

expert and knowledgeable about a small number of systems. This leads to

more effi-

cient operation but has consequences for the resilience of the system.

The problem with role specialization is that there may not be anyone

available at

a particular time who understands the interactions between systems.

Consequently,

it is difficult to cope with problems if the specialist is not available. If

people work with several systems, they come to understand the

dependencies and relationships

between them and so can tackle problems that affect more than one

system. With no

specialist available, it becomes much more difficult to contain the problem

and

repair any damage that has been caused.

You may use risk assessment, as discussed in Chapter 13, to help make

decisions

on the balance between process efficiency and resilience. You consider all

of the

risks where operator or manager intervention may be required and assess

the likeli-

hood of these risks and the extent of the possible losses that might arise.

For risks that may lead to serious damage and extensive loss and for risks

that are likely to

occur, you should favor resilience over process efficiency.

14.3 Resilient systems design

Resilient systems can resist and recover from adverse incidents such as

software

failures and cyberattacks. They can deliver critical services with minimal

interrup-

tions and can quickly return to their normal operating state after an

incident has

occurred. In designing a resilient system, you have to assume that system

failures or penetration by an attacker will occur, and you have to include

redundant and diverse features to cope with these adverse events.

Designing systems for resilience involves two closely related streams of

work:

1. Identifying critical services and assets Critical services and assets are those

elements of the system that allow a system to fulfill its primary purpose.

For exam-

ple, the primary purpose of a system that handles ambulance dispatch in

response to emergency calls is to get help to people who need it as quickly

as

possible. The critical services are those concerned with taking calls and

dis-

patching ambulances to the medical emergency. Other services such as

call log-

ging and ambulance tracking are less important.

2. Designing system components that support problem recognition, resistance,

recovery, and reinstatement For example, in an ambulance dispatch system,

a watchdog timer (see Chapter 12) may be included to detect if the system

is not

responding to events. Operators may have to authenticate with a hardware

token

to resist the possibility of unauthorized access. If the system fails, calls

may be

diverted to another center so that the essential services are maintained.

Copies

of the system database and software on alternative hardware may be

maintained

to allow for reinstatement after an outage.

14.3 Resilient systems design 425

1. Review system

requirements and

architecture

4. Identify softspots and

2. Identify critical services

survivability strategies

and components

3. Identify attacks and

compromisable

Figure 14.8 Stages in

components

survivability analysis

The fundamental notions of recognition, resistance, and recovery were the

basis

of early work in resilience engineering by Ellison et al . (Ellison et al.

1999, 2002) .

They designed a method of analysis called survivable systems analysis.

This method

is used to assess vulnerabilities in systems and to support the design of

system architectures and features that promote system survivability.

Survivable systems analysis is a four-stage process (Figure 14.8) that

analyzes

the current or proposed system requirements and architecture, identifies

critical services, attack scenarios, and system “softspots,” and proposes

changes to improve the survivability of a system. The key activities in

each of these stages are as follows: 1. System understanding For an existing

or proposed system, review the goals of the system (sometimes called the

mission objectives), the system requirements,

and the system architecture.

2. Critical service identification The services that must always be maintained

and the components that are required to maintain these services are

identified.

3. Attack simulation Scenarios or use cases for possible attacks are

identified, along with the system components that would be affected by

these attacks.

4. Survivability analysis Components that are both essential and

compromisable by an attack are identified, and survivability strategies

based on resistance, recognition, and recovery are identified.

The fundamental problem with this approach to survivability analysis is

that its

starting point is the requirements and architecture documentation for a

system. This is a reasonable assumption for defense systems (the work was

sponsored by the U.S.

Department of Defense), but it poses two problems for business systems:

1. It is not explicitly related to the business requirements for resilience. I

believe that these are a more appropriate starting point than technical

system requirements.

426 Chapter 14 Resilience engineering

Identify business

resilience

requirements

Plan backup

Plan system

Develop software

Reinstatement

strategy

reinstatement

to support

reinstatement

Propose software

changes

Recognition and

Identify critical

Identify assets

Identify events

Plan event

resistance

services

that deliver

that compromise

recognition and

critical services

assets

resistance

Buy new software

required

Plan critical

Plan critical

Design asset

Develop software

Recovery

service recovery

asset recovery

redundancy

to support

strategy

asset recovery

Resilience test

Identify attack

Test system

Test service

Testing

planning

and failure

resistance

recovery

scenarios

Test system

reinstatement

Figure 14.9

2. It assumes that there is a detailed requirements statement for a system.

In fact, Resilience engineering

resilience may have to be “retrofitted” to a system where there is no

complete or

up-to-date requirements document. For new systems, resilience may itself

be a

requirement, or systems may be developed using an agile approach. The

system

architecture may be designed to take resilience into account.

A more general resilience engineering method, as shown in Figure 14.9,

takes the

lack of detailed requirements into account as well as explicitly designing

recovery

and reinstatement into the system. For the majority of components in a

system, you

will not have access to their source code and will not be able to make

changes to

them. Your strategy for resilience has to be designed with this limitation

in mind.

There are five interrelated streams of work in this approach to resilience

engineering: 1. You identify business resilience requirements. These

requirements set out how

the business as a whole must maintain the services that it delivers to

customers

and, from this, resilience requirements for individual systems are

developed.

Providing resilience is expensive, and it is important not to overengineer

sys-

tems with unnecessary resilience support.

2. You plan how to reinstate a system or a set of systems to their normal

operating state after an adverse event. This plan has to be integrated with

the business’s

14.3 Resilient systems design 427

normal backup and archiving strategy that allows recovery of information

after

a technical or human error. It should also be part of a wider disaster

recovery

strategy. You have to take account of the possibility of physical events

such as

fire and flooding and study how to maintain critical information in

separate

locations. You may decide to use cloud backups for this plan.

3. You identify system failures and cyberattacks that can compromise a

system, and

you design recognition and resilience strategies to cope with these adverse

events.

4. You plan how to recover critical services quickly after they have been

damaged

or taken offline by a failure or cyberattack. This step usually involves

providing

redundant copies of the critical assets that provide these services and

switching

to these copies when required.

5. Critically, you should test all aspects of your resilience planning. This

testing involves identifying failure and attack scenarios and playing these

scenarios out

against your system.

Maintaining the availability of critical services is the essence of resilience.

Accordingly, you have to know:

the system services that are the most critical for a business,

the minimal quality of service that must be maintained,

how these services might be compromised,

how these services can be protected, and

how you can recover quickly if the services become unavailable.

As part of the analysis of critical services, you have to identify the system

assets that are essential for delivering these services. These assets may be

hardware (servers, network, etc.), software, data, and people. To build a

resilient system, you have to think about how to use redundancy and

diversity to ensure that these assets remain available in the event of a

system failure.

For all of these activities, the key to providing a rapid response and

recovery plan after an adverse event is to have additional software that

supports resistance, recovery, and reinstatement. This may be commercial

security software or resilience sup-

port that is programmed into application systems. It may also include

scripts and

specially written programs that are developed for recovery and

reinstatement. If you have the right support software, the processes of

recovery and reinstatement can be

partially automated and quickly invoked and executed after a system

failure.

Resilience testing involves simulating possible system failures and

cyberattacks to

test whether the resilience plans that have been drawn up work as

expected. Testing

is essential because we know from experience that the assumptions made

in resil-

ience planning are often invalid and that planned actions do not always

work. Testing for resilience can reveal these problems so that the

resilience plan can be refined.

428 Chapter 14 Resilience engineering

Mentcare

Mentcare

Mentcare

client

client

client

Network

Mentcare server

Patient database

Figure 14.10 The

client–server architecture

of the Mentcare system

Testing can be very difficult and expensive as, obviously, the testing

cannot be carried out on an operational system. The system and its

environment may have to be duplicated for testing, and staff may have to

be released from their normal responsibilities to work on the test system.

To reduce costs, you can use “desk testing.” The testing team assumes a

problem has occurred and tests their reactions to it; they do not simulate

that problem on a real system. While this approach can provide useful

information about system resilience, it is less effective than testing in

discovering deficiencies in the resilience plan.

As an example of this approach, let us look at resilience engineering for

the

Mentcare system. To recap, this system is used to support clinicians

treating patients in a variety of locations who have mental health

problems. It provides patient information and records of consultations

with doctors and specialist nurses. It includes a number of checks that can

flag patients who may be potentially dangerous or suicidal. Figure 14.10

shows the architecture of this system.

The system is consulted by doctors and nurses before and during a

consultation,

and patient information is updated after the consultation. To ensure the

effectiveness of clinics, the business resilience requirements are that the

critical system services are available during normal working hours, that

the patient data should not be permanently damaged or lost by a system

failure or cyberattack, and that patient infor-

mation should not be released to unauthorized people.

Two critical services in the system have to be maintained:

1. An information service that provides information about a patient’s

current diagnosis and treatment plan.

2. A warning service that highlights patients who could pose a danger to

others or to themselves.

Notice that the critical service is not the availability of the complete

patient

record. Doctors and nurses only need to go back to previous treatments

occasionally,

14.3 Resilient systems design 429

so clinical care is not seriously affected if a full record is not available.

Therefore, it is possible to deliver effective care using a summary record

that only includes information about the patient and recent treatment.

The assets required to deliver these services in normal system operations

are:

1. The patient record database that maintains all patient information.

2. A database server that provides access to the database for local client

computers.

3. A network for client/server communications.

4. Local laptop or desktop computers used by clinicians to access patient

information.

5. A set of rules that identify patients who are potentially dangerous and

that can flag patient records. Client software highlights dangerous patients

to system users.

To plan recognition, resistance, and recovery strategies, you need to

develop a set

of scenarios that anticipate adverse events that might compromise the

critical ser-

vices offered by the system. Examples of these adverse events are:

1. The unavailability of the database server either through a system

failure, a

network failure, or a denial-of-service cyberattack.

2. The deliberate or accidental corruption of the patient record database

or the

rules that define what is meant by a “dangerous patient.”

3. Infection of client computers with malware.

4. Access to client computers by unauthorized people who gain access to

patient records.

Figure 14.11 shows possible recognition and resistance strategies for these

adverse events. Notice that these are not just technical approaches but also

include workshops to inform system users about security issues. We know

that many security breaches arise because users inadvertently reveal

privileged information to an

attacker and these workshops reduce the chances of this happening. I

don’t have

space here to discuss all of the options that I identified in Figure 14.11.

Instead, I focus on how the system architecture can be modified to be

more resilient.

In Figure 14.11, I suggested that maintaining patient information on client

com-

puters was a possible redundancy strategy that could help maintain

critical services.

This leads to the modified software architecture shown in Figure 14.12.

The key

features of this architecture are:

1. Summary patient records that are maintained on local client computers The

local computers can communicate directly with each other and exchange

information using either the system network or, if necessary, an ad hoc

network cre-

ated using mobile phones. Therefore, if the database is unavailable,

doctors and

nurses can still access essential patient information. (resistance and

recovery)

2. A backup server to allow for main server failure This server is responsible

for taking regular snapshots of the database as backups. In the event the

main server

430 Chapter 14 Resilience engineering

Event

Recognition

Resistance

Server

1. Watchdog timer on client

1. Design system architecture to maintain local

unavailability

that times out if no response

copies of critical information

to client access

2. Provide peer-to-peer search across clients for

2. Text messages from system

patient data

managers to clinical users

3. Provide staff with smartphones that can be

used to access the network in the event of

server failure

4. Provide backup server

Patient database

1. Record level cryptographic

1. Replayable transaction log to update database

corruption

checksums

backup with recent transactions

2. Regular auto-checking of

2. Maintenance of local copies of patient

database integrity

information and software to restore database

3. Reporting system for

from local copies and backups

incorrect information

Malware

1. Reporting system so that

1. Security awareness workshops for all system users

infection of

computer users can report

2. Disabling of USB ports on client computers

client computers

unusual behavior

3. Automated system setup for new clients

2. Automated malware checks

4. Support access to system from mobile devices

on startup

5. Installation of security software

Unauthorized

1. Warning text messages from

1. Multilevel system authentication process

access to patient

users about possible intruders

2. Disabling of USB ports on client computers

information

2. Log analysis for unusual

3. Access logging and real-time log analysis

activity

4. Security awareness workshops for all system users

Figure 14.11

fails, it can also act as the main server for the whole system. This provides

con-

Recognition and

tinuity of service and recovery after a server failure (resistance and

recovery).

resistance strategies

for adverse events

3. Database integrity checking and recovery software Integrity checking runs

as a background task checking for signs of database corruption. If

corruption is discovered, it can automatically initiate the recovery of some

or all of the data from

backups. The transaction log allows these backups to be updated with

details of

recent changes (recognition and recovery).

To maintain the key services of patient information access and staff

warning, we

can make use of the inherent redundancy in a client-server system. By

downloading

information to the client at the start of a clinic session, the consultation

can continue without server access. Only the information about the

patients who are scheduled to

attend consultations that day needs to be downloaded. If there is a need to

access

other patient information and the server is unavailable, then other client

computers may be contacted using peer-to-peer communication to see if

the information is

available on them.

The service that provides a warning to staff of patients who may be

dangerous

can easily be implemented using this approach. The records of patients

who may

harm themselves or others are identified before the download process.

When clinical

staff access these records, the software can highlight the records to

indicate the

patients that require special care.

14.3 Resilient systems design 431

Mentcare

Mentcare

Mentcare

client

client

client

Summary

Summary

Summary

patient records

patient records

patient records

Network

Mentcare server

Backup server

Database

Patient database

integrity

checker

Figure 14.12 An

architecture for

Transaction

Mentcare system

Database backup

log

resilience

The features in this architecture that support the resistance to adverse

events are

also useful in supporting recovery from these events. By maintaining

multiple copies of information and having backup hardware available,

critical system services can

be quickly restored to normal operation. Because the system need only be

available

during normal working hours (say, 8 a.m to 6 p.m), the system can be

reinstated

overnight so that it is available for the following day after a failure.

As well as maintaining critical services, the other business requirements of

main-

taining the confidentiality and integrity of patient data must also be

supported. The architecture shown in Figure 14.12 includes a backup

system and explicit database

integrity checking to reduce the chances that patient information is

damaged acci-

dentally or in a malicious attack. Information on client computers is also

available and can be used to support recovery from data corruption or

damage.

While maintaining multiple copies of data is a safeguard against data

corruption,

it poses a risk to confidentiality as all of these copies have to be secured.

In this case, this risk can be controlled by:

1. Only downloading the summary records of patients who are scheduled

to attend

a clinic. This limits the number of records that could be compromised.

2. Encrypting the disk on local client computers. Attackers who do not

have the

encryption key cannot read the disk if they gain access to the computer.

3. Securely deleting the downloaded information at the end of a clinic

session. This further reduces the chances of an attacker gaining access to

confidential information.

432 Chapter 14 Resilience engineering

4. Ensuring that all network transactions are encrypted. If an attacker

intercepts

these transactions, they cannot get access to the information.

Because of performance degradation, it is probably impractical to encrypt

the entire patient database on the server. Strong authentication should

therefore be used to

protect this information.

K e y P o i n t s

The resilience of a system is a judgment of how well that system can

maintain the continuity of its critical services in the presence of disruptive

events, such as equipment failure and cyberattacks.

Resilience should be based on the 4 Rs model—recognition, resistance,

recovery, and reinstatement.

Resilience planning should be based on the assumption that networked

systems will be subject to cyberattacks by malicious insiders and outsiders

and that some of these attacks will be successful.

Systems should be designed with a number of defensive layers of

different types. If these layers are effective, human and technical failures

can be trapped and cyberattacks resisted.

To allow system operators and managers to cope with problems,

processes should be flexible and adaptable. Process automation can make

it more difficult for people to cope with problems.

Business resilience requirements should be the starting point for

designing systems for resilience. To achieve system resilience, you have to

focus on recognition and recovery from problems, recovery of critical

services and assets, and reinstatement of the system.

An important part of design for resilience is identifying critical services,

which are those services that are essential if a system is to ensure its

primary purpose. Systems should be designed so that these services are

protected and, in the event of failure, recovered as quickly as possible.

F u r t h e r r e a d i n g

“Survivable Network System Analysis: A Case Study.” An excellent paper

that introduces the notion of system survivability and uses a case study of

a mental health record treatment system to illustrate the application of a

survivability method. (R. J. Ellison, R. C. Linger, T. Longstaff, and N. R.

Mead, IEEE Software, 16 (4), July/August 1999) http://

dx.doi.org/10.1109/52.776952

Resilience Engineering in Practice: A Guidebook. This is a collection of

articles and case studies on resilience engineering that takes a broad,

sociotechnical systems perspective. (E. Hollnagel, J. Paries, D. W. Woods,

and J. Wreathall, Ashgate Publishing Co., 2011).

“Cyber Risk and Resilience Management.” This is a website with a wide

range of resources on cybersecurity and resilience, including a model for

resilience management. (Software Engineering Institute, 2013) https://

www.cert.org/resilience/

Chapter 14 Exercises 433

W e b S i t e

PowerPoint slides for this chapter:

www.pearsonglobaleditions.com/Sommerville

Links to supporting videos:

http://software-engineering-book.com/videos/security-and-resilience/

e x e R C i S e S

14.1. Explain how the complementary strategies of resistance, recognition,

recovery, and reinstatement may be used to provide system resilience.

14.2. What are the types of threats that have to be considered in resilience

planning? Provide examples of the controls that organizations should put

in place to counter those threats.

14.3. Describe the ways in which human error can be viewed according to

Reason (Reason, 2000) and the strategies that can be used to increase

resilience according to the Swiss cheese model (Figure 14.6).

14.4. A hospital proposes to introduce a policy that any member of

clinical staff (doctors or nurses) who takes or authorizes actions that leads

to a patient being injured will be subject to criminal charges. Explain why

this is a bad idea, which is unlikely to improve patient safety, and why it

is likely to adversely affect the resilience of the organization.

14.5. What is survivable systems analysis and what are the key activities

in each of the four stages involved in it as shown in Figure 14.8?

14.6. Explain why process inflexibility can inhibit the ability of a

sociotechnical system to resist and recover from adverse events such as

cyberattacks and software failure. If you have experience of process

inflexibility, illustrate your answer with examples from your experience.

14.7. Suggest how the approach to resilience engineering that I proposed

in Figure 14.9 could be used in conjunction with an agile development

process for the software in the system. What problems might arise in using

agile development for systems where resilience is important?

14.8. In Section 13.4.2, (1) an unauthorized user places malicious orders

to move prices and (2) an intrusion corrupts the database of transactions

that have taken place. For each of these cyberattacks, identify resistance,

recognition, and recovery strategies that might be used.

14.9. In Figure 14.11, I suggested a number of adverse events that could

affect the Mentcare system.

Draw up a test plan for this system that sets out how you could test the

ability of the Mentcare system to recognize, resist, and recover from these

events.

14.10. A senior manager in a company is concerned about insider attacks

from disaffected staff on the company’s IT assets. As part of a resilience

improvement program, she proposes that a logging system and data

analysis software be introduced to capture and analyze all employee

actions but that employees should not be told about this system. Discuss

the ethics of both introducing a logging system and doing so without

telling system users.

434 Chapter 14 Resilience engineering

R e F e R e n C e S

Ellison, R. J., R. C. Linger, T. Longstaff, and N. R. Mead. 1999. “Survivable

Network System Analysis: A Case Study.” IEEE Software 16 (4): 70–77.

doi:10.1109/52.776952.

Ellison, R. J., R. C. Linger, H. Lipson, N. R. Mead, and A. Moore. 2002.

“Foundations of Survivable Systems Engineering.” Crosstalk: The Journal of

Defense Software Engineering 12: 10–15.

http://resources.sei.cmu.edu/asset_files/

WhitePaper/2002_019_001_77700.pdf

Hollnagel, E. 2006. “Resilience—the Challenge of the Unstable.” In

Resilience Engineering: Concepts and Precepts, edited by E. Hollnagel, D. D.

Woods, and N.G. Leveson, 9–18.

. 2010. “RAG—The Resilience Analysis Grid.” In Resilience Engineering in

Practice, edited by E. Hollnagel, J. Paries, D. Woods, and J. Wreathall,

275–295. Farnham, UK: Ashgate Publishing Group.

InfoSecurity. 2013. “Global Cybercrime, Espionage Costs $100–$500

Billion Per Year.” http://www

.infosecurity-magazine.com/view/33569/global-cybercrime-espionage-

costs-100500-billion-per-year

Laprie, J-C. 2008. “From Dependability to Resilience.” In 38th Int. Conf. on

Dependable Systems and Networks. Anchorage, Alaska. http://2008.dsn.org/

fastabs/dsn08fastabs_laprie.pdf

Reason, J. 2000. “Human Error: Models and Management.” British Medical

J. 320: 768–770.

doi:10.1136/bmj.320.7237.768.

PART 3 Advanced

Software

Engineering

This part of the book covers more advanced software engineering topics.

I assume in these chapters that readers understand the basics of the disci-

pline, covered in Chapters 1–9.

Chapters 15–18 focus on the dominant development paradigm for web-

based information systems and enterprise systems—software reuse.

Chapter 15 introduces the topic and explains the different types of reuse

that are possible. I then cover the most common approach to reuse,

which is the reuse of application systems. These are configured and

adapted to the specific needs of each business.

Chapter 16 is concerned with the reuse of software components rather

than entire software systems. In this chapter, I explain what is meant by a

component and why standard component models are needed for effec-

tive component reuse. I also discuss the general process of component-

based software engineering and the problems of component composition.

The majority of large systems are now distributed systems and Chapter 17

covers issues and problems of building distributed systems. I introduce

the client-server approach as a fundamental paradigm of distributed sys-

tems engineering, and explain ways of implementing this architectural

style. The final section explains software as a service–the delivery of soft-

ware functionality over the Internet, which has changed the market for

software products.

Chapter 18 introduces the related topic of service-oriented architectures,

which link the notions of distribution and reuse. Services are reusable

software components whose functionality can be accessed over the

Internet. I discuss two widely-used approaches to service development

namely SOAP-based and RESTful services. I explain what is involved in

creating services (service engineering) and composing services to create

new software systems.

The focus of Chapters 19–21 is systems engineering. In Chapter 19,

I introduce the topic and explain why it is important that software

engineers should understand systems engineering. I discuss the sys-

tems engineering life cycle and the importance of procurement in that

life-cycle.

Chapter 20 covers systems of systems (SoS). The large systems that we

will build in the 21st century will not be developed from scratch but will

be created by integrating existing complex systems. I explain why an

understanding of complexity is important in SoS development and dis-

cuss architectural patterns for complex systems of systems.

Most software systems are not apps or business systems but are embed-

ded real-time systems. Chapter 21 covers this important topic. I introduce

the idea of a real-time embedded system and describe architectural pat-

terns that are used in embedded systems design. I then explain the pro-

cess of timing analysis and conclude the chapter with a discussion of

real-time operating systems.

15

Software reuse

Objectives

The objectives of this chapter are to introduce software reuse and to

describe approaches to system development based on large-scale

software reuse. When you have read this chapter, you will:

understand the benefits and problems of reusing software when

developing new systems;

understand the concept of an application framework as a set of

reusable objects and how frameworks can be used in application

development;

have been introduced to software product lines, which are made up

of a common core architecture and reusable components that are

configured for each version of the product;

have learned how systems can be developed by configuring and

composing off-the-shelf application software systems.

Contents

15.1 The reuse landscape

15.2 Application frameworks

15.3 Software product lines

15.4 Application system reuse

438 Chapter 15 Software reuse

Reuse-based software engineering is a software engineering strategy where

the

development process is geared to reusing existing software. Until around

2000,

systematic software reuse was uncommon, but it is now used extensively

in the

development of new business systems. The move to reuse-based

development has

been in response to demands for lower software production and

maintenance costs,

faster delivery of systems, and increased software quality. Companies see

their

software as a valuable asset. They are promoting reuse of existing systems

to

increase their return on software investments.

Reusable software of different kinds is now widely available. The open-

source

movement has meant that there is a huge code base that can be reused.

This may be in the form of program libraries or entire applications. Many

domain-specific application systems, such as ERP systems, are available

that can be tailored and adapted to customer requirements. Some large

companies provide a range of reusable components

for their customers. Standards, such as web service standards, have made

it easier to develop software services and reuse them across a range of

applications.

Reuse-based software engineering is an approach to development that tries

to

maximize the reuse of existing software. The software units that are

reused may be

of radically different sizes. For example:

1. System reuse Complete systems, which may be made up of a number of

application programs, may be reused as part of a system of systems

(Chapter 20).

2. Application reuse An application may be reused by incorporating it

without change into other systems or by configuring the application for

different

customers. Alternatively, application families or software product lines

that

have a common architecture, but that are adapted to individual customer

requirements, may be used to develop a new system.

3. Component reuse Components of an application, ranging in size from

subsystems to single objects, may be reused. For example, a pattern-

matching system

developed as part of a text-processing system may be reused in a database

management system. Components may be hosted on the cloud or on

private

servers and may be accessible through an application programming

interface

(API) as services.

4. Object and function reuse Software components that implement a single

function, such as a mathematical function, or an object class may be

reused. This

form of reuse, designed around standard libraries, has been common for

the past

40 years. Many libraries of functions and classes are freely available. You

reuse

the classes and functions in these libraries by linking them with newly

devel-

oped application code. In areas such as mathematical algorithms and

graphics,

where specialized, expensive expertise is needed to develop efficient

objects

and functions, reuse is particularly cost-effective.

All software systems and components that include generic functionality

are

potentially reusable. However, these systems or components are

sometimes so

Chapter 15 Software reuse 439

Benefit

Explanation

Accelerated development

Bringing a system to market as early as possible is often more important

than overall development costs. Reusing software can speed up system

production because both development and validation time may be

reduced.

Effective use of specialists

Instead of doing the same work over and over again, application

specialists

can develop reusable software that encapsulates their knowledge.

Increased dependability

Reused software, which has been tried and tested in working systems,

should be more dependable than new software. Its design and

implementation faults should have been found and fixed.

Lower development costs

Development costs are proportional to the size of the software being

developed. Reusing software means that fewer lines of code have to be

written.

Reduced process risk

The cost of existing software is already known, while the costs of

development are always a matter of judgment. This is an important factor

for

project management because it reduces the margin of error in project cost

estimation. This is especially true when large software components such as

subsystems are reused.

Standards compliance

Some standards, such as user interface standards, can be implemented as a

set of reusable components. For example, if menus in a user interface are

implemented using reusable components, all applications present the same

menu formats to users. The use of standard user interfaces improves

dependability because users make fewer mistakes when presented with a

familiar interface.

Figure 15.1 Benefits

of software reuse

specific that it is very expensive to modify them for a new situation.

Rather than

reuse the code, however, you can reuse the ideas that are the basis of the

software.

This is called concept reuse.

In concept reuse you do not reuse a software component; rather, you reuse

an

idea, a way of working, or an algorithm. The concept that you reuse is

represented in an abstract notation, such as a system model, which does

not include implementation

detail. It can, therefore, be configured and adapted for a range of

situations. Concept reuse is embodied in approaches such as design

patterns (Chapter 7), configurable

system products, and program generators. When concepts are reused, the

reuse pro-

cess must include an activity where the abstract concepts are instantiated

to create executable components.

An obvious advantage of software reuse is that overall development costs

are

lower. Fewer software components need to be specified, designed,

implemented,

and validated. However, cost reduction is only one benefit of software

reuse. I have listed other advantages of reusing software in Figure 15.1.

However, there are costs and difficulties associated with reuse (Figure

15.2).

There is a significant cost associated with understanding whether or not a

compo-

nent is suitable for reuse in a particular situation, and in testing that

component to ensure its dependability. These additional costs mean that

the savings in development costs may not be less than anticipated.

However, the other benefits of reuse

still apply.

440 Chapter 15 Software reuse

Problem

Explanation

Creating, maintaining, and using a

Populating a reusable component library and ensuring the software

component library

developers can use this library can be expensive. Development

processes have to be adapted to ensure that the library is used.

Finding, understanding, and

Software components have to be discovered in a library, understood,

adapting reusable components

and sometimes adapted to work in a new environment. Engineers

must be reasonably confident of finding a component in the library

before they include a component search as part of their normal

development process.

Increased maintenance costs

If the source code of a reused software system or component is not

available, then maintenance costs may be higher because the reused

elements of the system may become incompatible with changes

made to the system.

Lack of tool support

Some software tools do not support development with reuse. It may

be difficult or impossible to integrate these tools with a component

library system. The software process assumed by these tools may not

take reuse into account. This is more likely to be the case for tools

that support embedded systems engineering than for object-oriented

development tools.

“Not-invented-here” syndrome

Some software engineers prefer to rewrite components because they

believe they can improve on them. This is partly to do with trust and

partly to do with the fact that writing original software is seen as

more challenging than reusing other people’s software.

Figure 15.2 Problems

with software reuse

As I discussed in Chapter 2, software development processes have to be

adapted

to take reuse into account. In particular, there has to be a requirements

refinement stage where the requirements for the system are modified to

reflect the reusable software that is available. The design and

implementation stages of the system may also

include explicit activities to look for and evaluate candidate components

for reuse.

15.1 The reuse landscape

Over the past 20 years, many techniques have been developed to support

software

reuse. These techniques exploit the facts that systems in the same

application domain are similar and have potential for reuse, that reuse is

possible at different levels from simple functions to complete applications,

and that standards for reusable components facilitate reuse. Figure 15.3

shows the “reuse landscape”—different ways of

implementing software reuse. Each of these approaches to reuse is briefly

described

in Figure 15.4.

Given this array of techniques for reuse, the key question is “which is the

most

appropriate technique to use in a particular situation?” Obviously, the

answer to this question depends on the requirements for the system being

developed, the technology

15.1 The reuse landscape 441

Design

Architectural

patterns

patterns

Application

Software product

Application

frameworks

lines

system integration

ERP systems

Systems of

Configurable

Legacy system

systems

application systems

wrapping

Component-based

Model-driven

Service-oriented

software engineering

engineering

systems

Aspect-oriented

Program

Program

software engineering

generators

libraries

Figure 15.3 The reuse

landscape

and reusable assets available, and the expertise of the development team.

Key factors that you should consider when planning reuse are:

1. The development schedule for the software If the software has to be

developed quickly, you should try to reuse complete systems rather than

individual components. Although the fit to requirements may be

imperfect, this approach mini-

mizes the amount of development required.

2. The expected software lifetime If you are developing a long-lifetime

system, you should focus on the maintainability of the system. You should

not just think

about the immediate benefits of reuse but also of the long-term

implications.

Over its lifetime, you will have to adapt the system to new requirements,

which

will mean making changes to parts of the system. If you do not have

access to

the source code of the reusable components, you may prefer to avoid off-

the-

shelf components and systems from external suppliers. These suppliers

may not

be able to continue support for the reused software. You may decide that

it is

safer to reuse open-source systems and components (Chapter 7) as this

means

you can access and keep copies of the source code.

3. The background, skills and experience of the development team All reuse

technologies are fairly complex, and you need quite a lot of time to

understand and

use them effectively. Therefore, you should focus your reuse effort in areas

where your development team has expertise.

4. The criticality of the software and its non-functional requirements For a

critical system that has to be certified by an external regulator you may

have to create a

safety or security case for the system (discussed in Chapter 12). This is

difficult

if you don’t have access to the source code of the software. If your

software has

stringent performance requirements, it may be impossible to use strategies

such

as model-driven engineering (MDE) (Chapter 5). MDE relies on generating

code from a reusable domain-specific model of a system. However, the

code

generators used in MDE often generate relatively inefficient code.

442 Chapter 15 Software reuse

Approach

Description

Application frameworks

Collections of abstract and concrete classes are adapted and

extended to create application systems.

Application system integration

Two or more application systems are integrated to provide extended

functionality.

Architectural patterns

Standard software architectures that support common types of

application system are used as the basis of applications. Described in

Chapters 6, 11, and 17.

Aspect-oriented software

Shared components are woven into an application at different places

development

when the program is compiled. Described in web Chapter 31.

Component-based software

Systems are developed by integrating components (collections of

engineering

objects) that conform to component-model standards. Described in

Chapter 16.

Configurable application systems

Domain-specific systems are designed so that they can be configured

to the needs of specific system customers.

Design patterns

Generic abstractions that occur across applications are represented

as design patterns showing abstract and concrete objects and

interactions. Described in Chapter 7.

ERP systems

Large-scale systems that encapsulate generic business functionality

and rules are configured for an organization.

Legacy system wrapping

Legacy systems (Chapter 9) are “wrapped” by defining a set of

interfaces and providing access to these legacy systems through

these interfaces.

Model-driven engineering

Software is represented as domain models and implementation

independent models, and code is generated from these models.

Described in Chapter 5.

Program generators

A generator system embeds knowledge of a type of application and

is used to generate systems in that domain from a user-supplied

system model.

Program libraries

Class and function libraries that implement commonly used

abstractions are available for reuse.

Service-oriented systems

Systems are developed by linking shared services, which may be

externally provided. Described in Chapter 18.

Software product lines

An application type is generalized around a common architecture so

that it can be adapted for different customers.

Systems of systems

Two or more distributed systems are integrated to create a new

system. Described in Chapter 20.

Figure 15.4

5. The application domain In many application domains, such as

manufacturing Approaches that

and medical information systems, there are generic products that may be

reused

support software

by configuring them to a local situation. This is one of the most effective

reuse

approaches to reuse, and it is almost always cheaper to buy rather than

build a

new system.

15.2 Application frameworks 443

Generator-based reuse

Generator-based reuse involves incorporating reusable concepts and

knowledge into automated tools and providing an easy way for tool users

to integrate specific code with this generic knowledge. This approach is

usually most effective in domain-specific applications. Known solutions to

problems in that domain are embedded in the generator system and

selected by the user to create a new system.

http://software-engineering-book.com/web/generator-reuse/

6. The platform on which the system will run Some components models, such

as

.NET, are specific to Microsoft platforms. Similarly, generic application

sys-

tems may be platform-specific, and you may only be able to reuse these if

your

system is designed for the same platform.

The range of available reuse techniques is such that, in most situations,

there is the possibility of some software reuse. Whether or not reuse is

achieved is often a managerial rather than a technical issue. Managers

may be unwilling to compromise their

requirements to allow reusable components to be used. They may not

understand the

risks associated with reuse as well as they understand the risks of original

development.

Although the risks of new software development may be higher, some

managers may

prefer known risks of development to unknown risks of reuse. To promote

company-

wide reuse, it may be necessary to introduce a reuse program that focuses

on the creation of reusable assets and processes to facilitate reuse

(Jacobsen, Griss, and Jonsson 1997).

15.2 Application frameworks

Early enthusiasts for object-oriented development suggested that one of

the key ben-

efits of using an object-oriented approach was that objects could be reused

in different systems. However, experience has shown that objects are often

too fine-grained

and are often specialized for a particular application. It often takes longer

to understand and adapt the object than to reimplement it. It has now

become clear that

object-oriented reuse is best supported in an object-oriented development

process

through larger-grain abstractions called frameworks.

As the name suggests, a framework is a generic structure that is extended

to cre-

ate a more specific subsystem or application. Schmidt et al. (Schmidt et al.

2004)

define a framework to be

an integrated set of software artifacts (such as classes, objects and components)

that collaborate to provide a reusable architecture for a family of related

applications.

Frameworks provide support for generic features that are likely to be used

in all applications of a similar type. For example, a user interface

framework will provide support

†Schmidt, D. C., A. Gokhale, and B. Natarajan. 2004. “Leveraging

Application Frameworks.” ACM

Queue 2 (5 (July/August)): 66–75. doi:10.1145/1016998.1017005.

444 Chapter 15 Software reuse

User

view modification

Controller state

View state

inputs

messages

Controller methods

View methods

Model queries

Model edits

and updates

Model state

Figure 15.5 The

Model methods

Model-View-Controller

pattern

for interface event handling and will include a set of widgets that can be

used to construct displays. It is then left to the developer to specialize

these by adding specific functionality for a particular application. For

example, in a user interface framework, the developer defines display

layouts that are appropriate to the application being implemented.

Frameworks support design reuse in that they provide a skeleton

architecture for

the application as well as the reuse of specific classes in the system. The

architecture is implemented by the object classes and their interactions.

Classes are reused

directly and may be extended using features such as inheritance and

polymorphism.

Frameworks are implemented as a collection of concrete and abstract

object

classes in an object-oriented programming language. Therefore,

frameworks are

language-specific. Frameworks are available in commonly used object-

oriented

programming languages such as Java, C#, and C++, as well as in

dynamic languages

such as Ruby and Python. In fact, a framework can incorporate other

frameworks,

where each framework is designed to support the development of part of

the applica-

tion. You can use a framework to create a complete application or to

implement part

of an application, such as the graphical user interface.

The most widely used application frameworks are web application

frameworks

(WAFs), which support the construction of dynamic websites. The

architecture of a

WAF is usually based on the Model-View-Controller (MVC) Composite

pattern shown

in Figure 15.5. The MVC pattern was originally proposed in the 1980s as

an approach

to GUI design that allowed for multiple presentations of an object and

separate styles of interaction with each of these presentations. In essence,

it separates the state from its presentation so that the state may be

updated from each presentation.

An MVC framework supports the presentation of data in different ways

and

allows interaction with each of these presentations. When the data is

modified

through one of the presentations, the system model is changed and the

controllers

associated with each view update their presentation.

Frameworks are often implementations of design patterns, as discussed in

Chapter 7.

For example, an MVC framework includes the Observer pattern, the

Strategy pattern, the Composite pattern, and a number of others that are

discussed by Gamma et al. (Gamma et al. 1995). The general nature of

patterns and their use of abstract and concrete classes allow for

extensibility. Without patterns, frameworks would almost certainly be

impractical.

15.2 Application frameworks 445

GUI

Event

loop

Callbacks

Application-specific classes

Callbacks

Callbacks

Database Event

Platform Event

loop

loop

Figure 15.6 Inversion of

control in frameworks

While each framework includes slightly different functionality, web

application

frameworks usually provide components and classes that support:

1. Security WAFs may include classes to help implement user

authentication (login) and access control to ensure that users can only

access permitted functionality in the system.

2. Dynamic web pages Classes are provided to help you define web page

templates and to populate these dynamically with specific data from the

system database.

3. Database integration Frameworks don’t usually include a database but

assume that a separate database, such as MySQL, will be used. The

framework may

include classes that provide an abstract interface to different databases.

4. Session management Classes to create and manage sessions (a number of

interactions with the system by a user) are usually part of a WAF.

5. User interaction Web frameworks provide AJAX (Holdener 2008) and/or

HTML5 support (Sarris 2013), which allows interactive web pages to be

cre-

ated. They may include classes that allow device-independent interfaces to

be

created, which adapt automatically to mobile phones and tablets.

To implement a system using a framework, you add concrete classes that

inherit

operations from abstract classes in the framework. In addition, you define

“callbacks” —methods that are called in response to events recognized by

the frame-

work. The framework objects, rather than the application-specific objects,

are

responsible for control in the system. Schmidt et al . (Schmidt, Gokhale,

and Natarajan 2004) call this “inversion of control.”

In response to events from the user interface and database framework

objects

invoke “hook methods” that are then linked to user-provided

functionality. The user-

provided functionality defines how the application should respond to the

event

(Figure 15.6). For example, a framework will have a method that handles

a mouse

click from the environment. This method is called the hook method, which

you must

configure to call the appropriate application methods to handle the mouse

click.

446 Chapter 15 Software reuse

Fayad and Schmidt (Fayad and Schmidt 1997) discuss three other classes

of

framework:

1. System infrastructure frameworks support the development of system

infrastructures such as communications, user interfaces, and compilers.

2. Middleware integration frameworks consist of a set of standards and

associated object classes that support component communication and

information

exchange. Examples of this type of framework include Microsoft’s .NET

and

Enterprise Java Beans (EJB). These frameworks provide support for

standard-

ized component models, as discussed in Chapter 16.

3. Enterprise application frameworks are concerned with specific application

domains such as telecommunications or financial systems (Baumer et al.

1997).

These embed application domain knowledge and support the development

of

end-user applications. These are not now widely used and have been

largely

superseded by software product lines.†

Applications that are constructed using frameworks can be the basis for

further

reuse through the concept of software product lines or application

families. Because these applications are constructed using a framework,

modifying family members to

create instances of the system is often a straightforward process. It

involves rewriting concrete classes and methods that you have added to

the framework.

Frameworks are a very effective approach to reuse. However, they are

expensive to

introduce into software development processes as they are inherently

complex and it can take several months to learn to use them. It can be

difficult and expensive to evaluate available frameworks to choose the

most appropriate one. Debugging framework-based

applications is more difficult than debugging original code because you

may not understand how the framework methods interact. Debugging

tools may provide information

about the reused framework components, which the developer does not

understand.

15.3 Software product lines

When a company has to support a number of similar but not identical

systems, one of the most effective approaches to reuse is to create a

software product line. Hardware control systems are often developed using

this approach to reuse as are domain-specific applications in areas such as

logistics or medical systems. For example, a printer manufacturer has to

develop printer control software, where there is a specific version of the

product for each type of printer. These software versions have much in

common, so it makes

sense to create a core product (the product line) and adapt this for each

printer type.

A software product line is a set of applications with a common

architecture and

shared components, with each application specialized to reflect specific

customer

requirements. The core system is designed so that it can be configured and

adapted to

†Fayad, M. E., and D. C. Schmidt. 1997. “Object-Oriented Application

Frameworks.” Comm. ACM 40 (10): 32–38. doi:10.1145/262793.262798.

15.3 Software product lines 447

Specialized application components

Configurable application

components

Core

components

Figure 15.7 The

organization of a base

system for a product line

suit the needs of different customers or equipment. This may involve the

configuration of some components, implementing additional components,

and modifying some of

the components to reflect new requirements.

Developing applications by adapting a generic version of the application

means

that a high proportion of the application code is reused in each system.

Testing is

simplified because tests for large parts of the application may also be

reused, thus reducing the overall application development time. Engineers

learn about the application domain through the software product line and

so become specialists who can

work quickly to develop new applications.

Software product lines usually emerge from existing applications. That is,

an

organization develops an application and then, when a similar system is

required,

informally reuses code from this in the new application. The same process

is used as other similar applications are developed. However, change tends

to corrupt application structure so, as more new instances are developed,

it becomes increasingly difficult to create a new version. Consequently, a

decision to design a generic product line may

then be made. This involves identifying common functionality in product

instances

and developing a base application, which is then used for future

development.

This base application (Figure 15.7) is designed to simplify reuse and

reconfigura-

tion. Generally, a base application includes:

1. Core components that provide infrastructure support. These are not

usually

modified when developing a new instance of the product line.

2. Configurable components that may be modified and configured to

specialize them

to a new application. Sometimes it is possible to reconfigure these

components

without changing their code by using a built-in component configuration

language.

3. Specialized, domain-specific components some or all of which may be

replaced

when a new instance of a product line is created.

Application frameworks and software product lines have much in

common. They

both support a common architecture and components, and require new

development

to create a specific version of a system. The main differences between

these

approaches are as follows:

1. Application frameworks rely on object-oriented features such as

inheritance and

polymorphism to implement extensions to the framework. Generally, the

framework

448 Chapter 15 Software reuse

code is not modified, and the possible modifications are limited to

whatever is sup-

ported by the framework. Software product lines are not necessarily

created using

an object-oriented approach. Application components are changed,

deleted, or

rewritten. There are no limits, in principle at least, to the changes that can

be made.

2. Most application frameworks provide general support rather than

domain- specific support. For example, there are application frameworks

to create web-based

applications. A software product line usually embeds detailed domain and

plat-

form information. For example, there could be a software product line

con-

cerned with web-based applications for health record management.

3. Software product lines are often control applications for equipment. For

exam-

ple, there may be a software product line for a family of printers. This

means

that the product line has to provide support for hardware interfacing.

Application

frameworks are usually software-oriented, and they do not usually include

hard-

ware interaction components.

4. Software product lines are made up of a family of related applications,

owned by the same organization. When you create a new application, your

starting point is

often the closest member of the application family, not the generic core

application.

If you are developing a software product line using an object-oriented

program-

ming language, then you may use an application framework as a basis for

the system.

You create the core of the product line by extending the framework with

domain-

specific components using its built-in mechanisms. There is then a second

phase of

development where versions of the system for different customers are

created. For

example, you can use a web-based framework to build the core of a

software product

line that supports web-based help desks. This “help desk product line”

may then be

further specialized to provide particular types of help desk support.

The architecture of a software product line often reflects a general,

application-

specific architectural style or pattern. For example, consider a product-line

system that is designed to handle vehicle dispatching for emergency

services. Operators of

this system take calls about incidents, find the appropriate vehicle to

respond to the incident, and dispatch the vehicle to the incident site. The

developers of such a

system may market versions of it for police, fire, and ambulance services.

This vehicle dispatching system is an example of a generic resource

allocation

and management architecture (Figure 15.8). Resource management

systems use a

database of available resources and include components to implement the

resource

allocation policy that has been decided by the company using the system.

Users

interact with a resource management system to request and release

resources and to

ask questions about resources and their availability.

You can see how this four-layer structure may be instantiated in Figure

15.9,

which shows the modules that might be included in a vehicle dispatching

system

product line. The components at each level in the product-line system are

as follows: 1. At the interaction level, components provide an operator

display interface and

an interface with the communications systems used.

15.3 Software product lines 449

Interaction

User interface

I/O management

User

Resource

Query

authentication

delivery

management

Resource management

Resource

Resource policy

Resource

tracking

control

allocation

Database management

Figure 15.8 The

Transaction management

architecture of a

resource management

Resource database

system

Interaction

I/O management

Operator interface

Comms system

interface

I/O management

Resource management

Operator

Map and route

Report

Query

authentication

planner

generator

manager

Resource management

Vehicle status Incident

Vehicle

Equipment Vehicle

manager

logger

dispatcher

manager

locator

Database management

Figure 15.9 A product-

Equipment

Transaction management

Incident log

line architecture

database

Vehicle database

Map database

of a vehicle

dispatcher system

2. At the I/O management level (level 2), components handle operator

authentication, generate reports of incidents and vehicles dispatched,

support map output and route

planning, and provide a mechanism for operators to query the system

databases.

3. At the resource management level (level 3), components allow vehicles

to be

located and dispatched, update the status of vehicles and equipment, and

log

details of incidents.

4. At the database level, as well as the usual transaction management

support,

there are separate databases of vehicles, equipment, and maps.

450 Chapter 15 Software reuse

Renegotiate

requirements

Elicit

Choose

stakeholder

closest-fit

requirements

system instance

Adapt existing

Deliver new

system

system instance

Figure 15.10 Product

To create a new instance of this system, you may have to modify

individual com-

instance development ponents. For example, the police have a large

number of vehicles but a relatively small number of vehicle types. By

contrast, the fire service has many types of specialized vehicles but

relatively few vehicles. Therefore, when you are implementing

a system for these different services, you may have to define a different

vehicle

database structure.

Various types of specialization of a software product line may be

developed:

1. Platform specialization Versions of the application may be developed for

different platforms. For example, versions of the application may exist for

Windows,

Mac OS, and Linux platforms. In this case, the functionality of the

application is

normally unchanged; only those components that interface with the

hardware

and operating system are modified.

2. Environment specialization Versions of the application may be created to

handle different operating environments and peripheral devices. For

example, a system

for the emergency services may exist in different versions, depending on

the

communications hardware used by each service. For example, police

radios may

have built-in encryption that has to be used. The product-line components

are

changed to reflect the functionality and characteristics of the equipment

used.

3. Functional specialization Versions of the application may be created for

specific customers who have different requirements. For example, a library

automation

system may be modified depending on whether it is used in a public

library, a

reference library, or a university library. In this case, components that

implement

functionality may be modified and new components added to the system.

4. Process specialization The system may be adapted to cope with specific

business processes. For example, an ordering system may be adapted to

cope with a centralized ordering process in one company and with a

distributed process in another.

Figure 15.10 shows the process for extending a software product line to

create a

new application. The activities in this process are:

1. Elicit stakeholder requirements You may start with a normal requirements

engineering process. However, because a system already exists, you can

demon-

strate the system and have stakeholders experiment with it, expressing

their

requirements as modifications to the functions provided.

15.3 Software product lines 451

2. Select the existing system that is the closest fit to the requirements When

creating a new member of a product line, you may start with the nearest

product

instance. The requirements are analyzed, and the family member that is

the clos-

est fit is chosen for modification.

3. Renegotiate requirements As more details of required changes emerge and

the project is planned, some requirements may be renegotiated with the

customer to

minimize the changes that will have to be made to the base application.

4. Adapt existing system New modules are developed for the existing

system, and existing system modules are adapted to meet the new

requirements.

5. Deliver new product family member The new instance of the product line

is delivered to the customer. Some deployment-time configuration may be

required to reflect the particular environments where the system will be

used. At

this stage, you should document its key features so that it may be used as

a basis

for other system developments in the future.

When you create a new member of a product line, you may have to find a

com-

promise between reusing as much of the generic application as possible

and satis-

fying detailed stakeholder requirements. The more detailed the system

requirements, the less likely it is that the existing components will meet

these

requirements. However, if stakeholders are willing to be flexible and to

limit the

system modifications that are required, you can usually deliver the system

more

quickly and at a lower cost.

Software product lines are designed to be reconfigurable. This reconfigura-

tion may involve adding or removing components from the system,

defining

parameters and constraints for system components, and including

knowledge of

business processes. This configuration may occur at different stages in the

devel-

opment process:

1. Design-time configuration The organization that is developing the

software modifies a common product-line core by developing, selecting, or

adapting

components to create a new system for a customer.

2. Deployment-time configuration A generic system is designed for

configuration by a customer or consultants working with the customer.

Knowledge of the

customer’s specific requirements and the system’s operating environment

is

embedded in the configuration data used by the generic system.

When a system is configured at design time, the supplier starts with either

a

generic system or an existing product instance. By modifying and

extending mod-

ules in this system, the supplier creates a specific system that delivers the

required customer functionality. This usually involves changing and

extending the source

code of the system so that greater flexibility is possible than with

deployment-

time configuration.

452 Chapter 15 Software reuse

Configuration

planning tool

Generic system

Configuration

database

Figure 15.11

System database

Deployment-time

configuration

Design-time configuration is used when it is impossible to use the existing

deployment-time configuration facilities in a system to develop a new

system

version. However, over time, when you have created several family

members with

comparable functionality, you may decide to refactor the core product line

to include functionality that has been implemented in several application

family members. You

then make that new functionality configurable when the system is

deployed.

Deployment-time configuration involves using a configuration tool to

create a

specific system configuration that is recorded in a configuration database

or as a set of configuration files (Figure 15.11). The executing system,

which may either run on a server or as a stand-alone system on a PC,

consults this database when executing so that its functionality may be

specialized to its execution context.

Several levels of deployment-time configuration may be provided in a

system:

1. Component selection, where you select the modules in a system that

provide the required functionality. For example, in a patient information

system, you may

select an image management component that allows you to link medical

images

(X-rays, CT scans, etc.) to the patient’s medical record.

2. Workflow and rule definition, where you define workflows (how

information is processed, stage by stage), and validation rules that should

apply to information

entered by users or generated by the system.

3. Parameter definition, where you specify the values of specific system

parameters that reflect the instance of the application that you are

creating. For example, you may specify the maximum length of fields for

data input by a user or the characteristics of hardware attached to the

system.

Deployment-time configuration can be very complex, and for large

systems, it may

take several months to configure and test a system for a customer. Large

configurable systems may support the configuration process by providing

software tools, such as

planning tools, to support the configuration process. I discuss deployment-

time con-

figuration further in Section 15.4.1. This discussion covers the reuse of

application systems that have to be configured to work in different

operational environments.

15.4 Application system reuse 453

15.4 Application system reuse

An application system product is a software system that can be adapted to

the needs

of different customers without changing the source code of the system.

Application

systems are developed by a system vendor for a general market; they are

not spe-

cially developed for an individual customer. These system products are

sometimes

known as COTS (Commercial Off-the Shelf System) products. However,

the term

“COTS” is mostly used in military systems, and I prefer to call these

system prod-

ucts application systems.

Virtually all desktop software for business and many server-based systems

are

application systems. This software is designed for general use, so it

includes many

features and functions. It therefore has the potential to be reused in

different environments and as part of different applications. Torchiano and

Morisio (Torchiano and

Morisio 2004) also discovered that open-source products were often used

without

change and without looking at the source code.

Application system products are adapted by using built-in configuration

mecha-

nisms that allow the functionality of the system to be tailored to specific

customer needs. For example, in a hospital patient record system, separate

input forms and

output reports might be defined for different types of patients. Other

configuration features may allow the system to accept plug-ins that extend

functionality or check

user inputs to ensure that they are valid.

This approach to software reuse has been very widely adopted by large

com-

panies since the late 1990s, as it offers significant benefits over

customized soft-

ware development:

1. As with other types of reuse, more rapid deployment of a reliable

system may

be possible.

2. It is possible to see what functionality is provided by the applications,

and so it is easier to judge whether or not they are likely to be suitable.

Other companies

may already use the applications, so experience of the systems is available.

3. Some development risks are avoided by using existing software.

However, this

approach has its own risks, as I discuss below.

4. Businesses can focus on their core activity without having to devote a

lot of

resources to IT systems development.

5. As operating platforms evolve, technology updates may be simplified as

these

are the responsibility of the application system vendor rather than the

customer.

Of course, this approach to software engineering has its own problems:

1. Requirements usually have to be adapted to reflect the functionality

and mode

of operation of the off-the-shelf application system. This can lead to

disruptive

changes to existing business processes.

454 Chapter 15 Software reuse

Configurable application systems

Application system integration

Single product that provides the functionality

Several different application systems are

required by a customer

integrated to provide customized functionality

Based on a generic solution and standardized

Flexible solutions may be developed for customer

processes

processes

Development focus is on system configuration

Development focus is on system integration

System vendor is responsible for maintenance

System owner is responsible for maintenance

System vendor provides the platform for the system

System owner provides the platform for the system

Figure 15.12

2. The application system may be based on assumptions that are

practically impos-

Individual and

sible to change. The customer must therefore adapt its business to reflect

these

integrated application

systems

assumptions.

3. Choosing the right application system for an enterprise can be a

difficult process, especially as many of these systems are not well

documented. Making the wrong

choice means that it may be impossible to make the new system work as

required.

4. There may be a lack of local expertise to support systems development.

Consequently, the customer has to rely on the vendor and external

consultants

for development advice. This advice may be geared to selling products and

ser-

vices, with insufficient time taken to understand the real needs of the

customer.

5. The system vendor controls system support and evolution. It may go out

of busi-

ness, be taken over, or make changes that cause difficulties for customers.

Application systems may be used as individual systems or in combination,

where

two or more systems are integrated. Individual systems consist of a generic

application from a single vendor that is configured to customer

requirements. Integrated systems involve integrating the functionality of

individual systems, often from different vendors, to create a new

application system. Figure 15.12 summarizes the differences between

these different approaches. I discuss application system integration in

Section 15.4.2.

15.4.1 Configurable application systems

Configurable application systems are generic application systems that may

be

designed to support a particular business type, business activity, or,

sometimes, a

complete business enterprise. For example, a system produced for dentists

may han-

dle appointments, reminders, dental records, patient recall, and billing. At

a larger scale, an Enterprise Resource Planning (ERP) system may support

the manufacturing, ordering, and customer relationship management

processes in a large company.

Domain-specific application systems, such as systems to support a business

function

(e.g., document management), provide functionality that is likely to be

required by a range of potential users. However, they also incorporate

built-in assumptions about how

15.4 Application system reuse 455

Purchasing

Supply chain

Logistics

CRM

Processes

Processes

Processes

Processes

Business rules

Figure 15.13 The

System database

architecture of an

ERP system

users work, and these assumptions may cause problems in specific

situations. For example, a system to support student registration in a

university may assume that students will be registered for one degree at

one university. However, if universities collaborate to offer joint degrees,

then it may be practically impossible to represent this detail in the system.

Enterprise Resource Planning (ERP) systems, such as those produced by

SAP and

Oracle, are large-scale, integrated systems designed to support business

practices

such as ordering and invoicing, inventory management, and

manufacturing schedul-

ing (Monk and Wagner 2013). The configuration process for these systems

involves

gathering detailed information about the customer’s business and business

pro-

cesses, and embedding this information in a configuration database. This

often

requires detailed knowledge of configuration notations and tools and is

usually car-

ried out by consultants working alongside system customers.

A generic ERP system includes a number of modules that may be

composed in

different ways to create a system for a customer. The configuration

process involves choosing which modules are to be included, configuring

these individual modules,

defining business processes and business rules, and defining the structure

and organization of the system database. A model of the overall

architecture of an ERP system that supports a range of business functions

is shown in Figure 15.13.

The key features of this architecture are as follows:

1. A number of modules to support different business functions. These are

large grain modules that may support entire departments or divisions of

the business. In the

example shown in Figure 15.13, the modules that have been selected for

inclusion

in the system are a module to support purchasing; a module to support

supply chain

management; a logistics module to support the delivery of goods; and a

customer

relationship management (CRM) module to maintain customer

information.

2. A defined set of business process models, associated with each module,

which

relate to activities in that module. For example, the ordering process

model may

define how orders are created and approved. This will specify the roles

and

activities involved in placing an order.

3. A common database that maintains information about all related

business func-

tions. Thus, it should not be necessary to replicate information, such as

cus-

tomer details, in different parts of the business.

456 Chapter 15 Software reuse

4. A set of business rules that apply to all data in the database. Therefore,

when

data is input from one function, these rules should ensure that it is

consistent

with the data required by other functions. For example, a business rule

may

require that all expense claims have to be approved by someone more

senior

than the person making the claim.

ERP systems are used in almost all large companies to support some or all

of their

functions. They are, therefore, a very widely used form of software reuse.

The obvi-

ous limitation of this approach to reuse is that the functionality of the

customer’s application is restricted to the functionality of the ERP system’s

built-in modules. If a company needs additional functionality, it may have

to develop a separate add-on

system to provide this functionality.

Furthermore, the buyer company’s processes and operations have to be

defined in

the ERP system’s configuration language. This language embeds the

understanding

of business processes as seen by the system vendor, and there may be a

mismatch

between these assumptions and the concepts and processes used in the

customer’s

business. A serious mismatch between the customer’s business model and

the sys-

tem model used by the ERP system makes it highly probable that the ERP

system

will not meet the customer’s real needs (Scott 1999).

For example, in an ERP system that was sold to a university, a

fundamental system

concept was the notion of a customer. In this system, a customer was an

external agent that bought goods and services from a supplier. This

concept caused great difficulties when configuring the system. Universities

do not really have customers. Rather, they have customer-type

relationships with a range of people and organizations such as

students, research funding agencies, and educational charities. None of

these relationships is compatible with a customer relationship where a

person or business buys products or services from another. In this

particular case, it took several months to resolve this mismatch, and the

final solution only partially met the university’s requirements.

ERP systems usually require extensive configuration to adapt them to the

require-

ments of each organization where they are installed. This configuration

may involve: 1. Selecting the required functionality from the system, for

example, by deciding

what modules should be included.

2. Establishing a data model that defines how the organization’s data will

be structured in the system database.

3. Defining business rules that apply to that data.

4. Defining the expected interactions with external systems.

5. Designing the input forms and the output reports generated by the

system.

6. Designing new business processes that conform to the underlying

process model

supported by the system.

7. Setting parameters that define how the system is deployed on its

underlying

platform.

15.4 Application system reuse 457

Once the configuration settings are completed, the new system is then

ready for

testing. Testing is a major problem when systems are configured rather

than pro-

grammed using a conventional language. There are two reasons for this:

1. Test automation may be difficult or impossible. There may be no easy

access to

an API that can be used by testing frameworks such as JUnit, so the

system has

to be tested manually by testers inputting test data to the system.

Furthermore,

systems are often specified informally, so defining test cases may be

difficult

without a lot of help from end-users.

2. Systems errors are often subtle and specific to business processes. The

application system or ERP system is a reliable platform, so technical

system

failures are rare. The problems that occur are often due to misunderstand-

ings between those configuring the system and user stakeholders. System

testers without detailed knowledge of the end-user processes cannot detect

these errors.

15.4.2 Integrated application systems

Integrated application systems include two or more application systems or,

some-

times, legacy systems. You may use this approach when no single

application sys-

tem meets all of your needs or when you wish to integrate a new

application system

with systems that you are already using. The component systems may

interact

through their APIs or service interfaces if these are defined. Alternatively,

they may be composed by connecting the output of one system to the

input of another or by

updating the databases used by the applications.

To develop integrated application systems, you have to make a number of

design choices:

1. Which individual application systems offer the most appropriate

functionality?

Typically, several system products will be available, which can be

combined in

different ways. If you don’t already have experience with a particular

applica-

tion system, it can be difficult to decide which product is the most

suitable.

2. How will data be exchanged? Different systems normally use unique data

structures and formats. You have to write adaptors that convert from one

repre-

sentation to another. These adaptors are runtime systems that operate

alongside

the constituent application systems.

3. What features of a product will actually be used? Individual application

systems may include more functionality than you need, and functionality

may be

duplicated across different products. You have to decide which features in

what

product are most appropriate for your requirements. If possible, you

should

also deny access to unused functionality because this can interfere with

normal

system operation.

458 Chapter 15 Software reuse

Client

Web browser

Email system

Server

E-commerce

Ordering and

Adaptor

system

invoicing system

Figure 15.14 An

Email system

Adaptor

integrated procurement

system

Consider the following scenario as an illustration of application system

integration.

A large organization intends to develop a procurement system that allows

staff to

place orders from their desk. By introducing this system across the

organization, the company estimates that it can save $5 million per year.

By centralizing buying, the

new procurement system can ensure that orders are always made from

suppliers who

offer the best prices and should reduce the administration associated with

orders.

As with manual systems, the system involves choosing the goods available

from a

supplier, creating an order, having the order approved, sending the order

to a sup-

plier, receiving the goods, and confirming that payment should be made.

The company has a legacy ordering system that is used by a central

procurement

office. This order processing software is integrated with an existing

invoicing and

delivery system. To create the new ordering system, the legacy system is

integrated

with a web-based e-commerce platform and an email system that handles

commu-

nications with users. The structure of the final procurement system is

shown in

Figure 15.14.

This procurement system should be a client–server system with standard

web

browsing and email systems used on the client. On the server, the e-

commerce

platform has to integrate with the existing ordering system through an

adaptor. The

e-commerce system has its own format for orders, confirmations of

delivery, and

so forth, and these have to be converted into the format used by the

ordering sys-

tem. The e-commerce system uses the email system to send notifications to

users,

but the ordering system was never designed for this purpose. Therefore,

another

adaptor has to be written to convert the notifications from the ordering

system into email messages.

Months, sometimes years, of implementation effort can be saved, and the

time to

develop and deploy a system can be drastically reduced by integrating

existing application systems. The procurement system described above

was implemented and

deployed in a very large company in nine months. It had originally been

estimated

that it would take three years to develop a procurement system in Java

that could be integrated with the legacy ordering system.

15.4 Application system reuse 459

Service wrapper

Application

system

Figure 15.15

Application wrapping

Services

Services

Application system integration can be simplified if a service-oriented

approach

is used. Essentially, a service-oriented approach means allowing access to

the

application system’s functionality through a standard service interface,

with a

service for each discrete unit of functionality. Some applications may offer

a ser-

vice interface, but sometimes this service interface has to be implemented

by the

system integrator. Essentially, you have to program a wrapper that hides

the

application and provides externally visible services (Figure 15.15). This

approach

is particularly valuable for legacy systems that have to be integrated with

newer

application systems.

In principle, integrating application systems is the same as integrating any

other

component. You have to understand the system interfaces and use them

exclusively

to communicate with the software; you have to trade off specific

requirements

against rapid development and reuse; and you have to design a system

architecture

that allows the application systems to operate together.

However, the fact that these products are usually large systems in their

own right, and are often sold as separate standalone systems, introduces

additional problems. Boehm and Abts (Boehm and Abts 1999) highlight

four important system integration problems: 1. Lack of control over

functionality and performance Although the published interface of a product

may appear to offer the required facilities, the system may

not be properly implemented or may perform poorly. The product may

have

hidden operations that interfere with its use in a specific situation. Fixing

these

problems may be a priority for the system integrator but may not be of

real con-

cern for the product vendor. Users may simply have to find workarounds

to

problems if they wish to reuse the application system.

2. Problems with system interoperability It is sometimes difficult to get

individual application systems to work together because each system

embeds its own

assumptions about how it will be used. Garlan et al. (Garlan, Allen, and

Ockerbloom 1995), reporting on their experience integrating four

application

systems, found that three of these products were event-based but that each

used

a different model of events. Each system assumed that it had exclusive

access to

the event queue. As a consequence, integration was very difficult. The

project

460 Chapter 15 Software reuse

required five times as much effort as originally predicted. The schedule

was

extended to two years rather than the predicted six months.

In a retrospective analysis of their work 10 years later, Garlan et al.

(Garlan,

Allen, and Ockerbloom 2009) concluded that the integration problems

that they

discovered had not been solved. Torchiano and Morisio (Torchiano and

Morisio

2004) found that lack of compliance with standards in many application

systems

meant that integration was more difficult than anticipated.

3. No control over system evolution Vendors of application systems make

their own decisions on system changes, in response to market pressures.

For PC products in

particular, new versions are often produced frequently and may not be

compatible

with all previous versions. New versions may have additional unwanted

function-

ality, and previous versions may become unavailable and unsupported.

4. Support from system vendors The level of support available from system

vendors varies widely. Vendor support is particularly important when

problems

arise as developers do not have access to the source code and detailed

documen-

tation of the system. While vendors may commit to providing support,

changing

market and economic circumstances may make it difficult for them to

deliver

this commitment. For example, a system vendor may decide to discontinue

a

product because of limited demand, or they may be taken over by another

com-

pany that does not wish to support the products that have been acquired.

Boehm and Abts reckon that, in many cases, the cost of system

maintenance and

evolution may be greater for integrated application systems. The above

difficulties

are life-cycle problems; they don’t just affect the initial development of the

system.

The further removed the people involved in the system maintenance

become from

the original system developers, the more likely it is that difficulties will

arise with the integrated system.

K e y P o i n t s

There are many different ways to reuse software. These range from the

reuse of classes and methods in libraries to the reuse of complete

application systems.

The advantages of software reuse are lower costs, faster software

development, and lower risks.

System dependability is increased. Specialists can be used more effectively

by concentrating their expertise on the design of reusable components.

Application frameworks are collections of concrete and abstract objects

that are designed for reuse through specialization and the addition of new

objects. They usually incorporate good design practice through design

patterns.

Chapter 15 Website 461

Software product lines are related applications that are developed from

one or more base applications. A generic system is adapted and specialized

to meet specific requirements for functionality, target platform, or

operational configuration.

Application system reuse is concerned with the reuse of large-scale, off-

the-shelf systems.

These provide a lot of functionality, and their reuse can radically reduce

costs and development time. Systems may be developed by configuring a

single, generic application system or by integrating two or more

application systems.

Potential problems with application system reuse include lack of control

over functionality, performance, and system evolution; the need for

support from external vendors; and difficulties in ensuring that systems

can interoperate.

F u r T h e r r e A d i n g

“Overlooked Aspects of COTS-Based Development.” An interesting article

that discusses a survey of developers using a COTS-based approach, and

the problems that they encountered. (M. Torchiano and M. Morisio, IEEE

Software, 21 (2), March–April 2004) http://dx.doi.org/10.1109/

MS.2004.1270770

CRUISE—Component Reuse in Software Engineering. This e-book covers a

wide range of reuse topics, including case studies, component-based reuse,

and reuse processes. However, its coverage of

application system reuse is limited. (L. Nascimento et al., 2007) http://

www.academia.edu/179616/

C.R.U.I.S.E_-_Component_Reuse_in_Software_Engineering

“Construction by Configuration: A New Challenge for Software

Engineering.” In this invited paper, I discuss the problems and difficulties

of constructing a new application by configuring existing systems. (I.

Sommerville, Proc. 19th Australian Software Engineering Conference, 2008)

http://dx.

doi.org/10.1109/ASWEC.2008.75

“Architectural Mismatch: Why Reuse Is Still So Hard.” This article looks

back on an earlier paper that discussed the problems of reusing and

integrating a number of application systems. The authors concluded that,

although some progress has been made, there were still problems in

conflicting assumptions made by the designers of the individual systems.

(D. Garlan et al., IEEE

Software, 26 (4), July–August 2009) http://dx.doi.org//10.1109/

MS.2009.86

W e b S i T e

PowerPoint slides for this chapter:

www.pearsonglobaleditions.com/Sommerville

Links to supporting videos:

http://software-engineering-book.com/videos/software-reuse/

462 Chapter 15 Software reuse

e x e r C i S e S

15.1. What major technical and nontechnical factors hinder software

reuse? Do you personally reuse much software and, if not, why not?

15.2. List the benefits of software reuse and explain why the expected

lifetime of the software should be considered when planning reuse.

15.3. How does the base application’s design in the product line simplify

reuse and reconfiguration?

15.4. Explain what is meant by “inversion of control” in application

frameworks. Explain why this approach could cause problems if you

integrated two separate systems that were originally created using the

same application framework.

15.5. Using the example of the weather station system described in

Chapters 1 and 7, suggest a product-line architecture for a family of

applications that are concerned with remote monitoring and data

collection. You should present your architecture as a layered model,

showing the components that might be included at each level.

15.6. Most desktop software, such as word processing software, can be

configured in a number of different ways. Examine software that you

regularly use and list the configuration options for that software. Suggest

difficulties that users might have in configuring the software. Microsoft

Office (or one of its open-source alternatives) is a good example to use for

this exercise.

15.7. Why have many large companies chosen ERP systems as the basis

for their organizational information system? What problems may arise

when deploying a large-scale ERP system in an organization?

15.8. What are the significant benefits offered by the application system

reuse approach when compared with the custom software development

approach?

15.9. Explain why adaptors are usually needed when systems are

constructed by integrating application systems. Suggest three practical

problems that might arise in writing adaptor software to link two

application systems.

15.10. The reuse of software raises a number of copyright and intellectual

property issues. If a customer pays a software contractor to develop a

system, who has the right to reuse the developed code?

Does the software contractor have the right to use that code as a basis for

a generic component?

What payment mechanisms might be used to reimburse providers of

reusable components?

Discuss these issues and other ethical issues associated with the reuse of

software.

r e F e r e n C e S

Baumer, D., G. Gryczan, R. Knoll, C. Lilienthal, D. Riehle, and H.

Zullighoven. 1997. “Framework Development for Large Systems.” Comm.

ACM 40 (10): 52–59. doi:10.1145/262793.262804.

Boehm, B., and C. Abts. 1999. “COTS Integration: Plug and Pray?”

Computer 32 (1): 135–138.

doi:10.1109/2.738311.

Fayad, M.E., and D.C. Schmidt. 1997. “Object-Oriented Application

Frameworks.” Comm. ACM 40

(10): 32–38. doi:10.1145/262793.262798.

Chapter 15 References 463

Gamma, E., R. Helm, R. Johnson, and J. Vlissides. 1995. Design Patterns:

Elements of Reusable Object-Oriented Software. Reading, MA: Addison-

Wesley.

Garlan, D., R. Allen, and J. Ockerbloom. 1995. “Architectural Mismatch:

Why Reuse Is So Hard.” IEEE

Software 12 (6): 17–26. doi:10.1109/52.469757.

––––––. 2009. “Architectural Mismatch: Why Reuse Is Still so Hard.” IEEE

Software 26 (4): 66–69.

doi:10.1109/MS.2009.86.

Holdener, A.T. 2008. Ajax: The Definitive Guide. Sebastopol, CA: O’Reilly

and Associates.

Jacobsen, I., M. Griss, and P. Jonsson. 1997. Software Reuse. Reading, MA:

Addison-Wesley.

Monk, E., and B. Wagner. 2013. Concepts in Enterprise Resource Planning,

4th ed. Independence, KY: CENGAGE Learning.

Sarris, S. 2013. HTML5 Unleashed. Indianapolis, IN: Sams Publishing.

Schmidt, D. C., A. Gokhale, and B. Natarajan. 2004. “Leveraging

Application Frameworks.” ACM

Queue 2 (5 (July/August)): 66–75. doi:10.1145/1016998.1017005.

Scott, J. E. 1999. “The FoxMeyer Drug’s Bankruptcy: Was It a Failure of

ERP.” In Proc. Association for Information Systems 5th Americas Conf. on

Information Systems. Milwaukee, WI. http://www.uta.

edu/faculty/weltman/OPMA5364TW/FoxMeyer.pdf

Torchiano, M., and M. Morisio. 2004. “Overlooked Aspects of COTS-Based

Development.” IEEE

Software 21 (2): 88–93. doi:10.1109/MS.2004.1270770.

16

Component-based

software engineering

Objectives

The objective of this chapter is to describe an approach to software

reuse based on the composition of standardized, reusable

components. When you have read this chapter, you will:

understand what is meant by a software component that may be

included in a program as an executable element;

understand the key elements of software component models and

the support provided by middleware for these models;

be aware of the key activities in the component-based software

engineering (CBSE) process for reuse and the CBSE process with

reuse;

understand three different types of component composition and

some of the problems that have to be resolved when components

are composed to create new components or systems.

Contents

16.1 Components and component models

16.2 CBSE processes

16.3 Component composition

Chapter 16 Component-based software engineering 465

Component-based software engineering (CBSE) emerged in the late 1990s

as an

approach to software systems development based on reusing software

components.

Its creation was motivated by frustration that object-oriented development

had not

led to extensive reuse, as had been originally suggested. Single-object

classes were too detailed and specific and often had to be bound with an

application at compile-time. You had to have detailed knowledge of the

classes to use them, which usually

meant that you had to have the component source code. Selling or

distributing

objects as individual reusable components was therefore practically

impossible.

Components are higher-level abstractions than objects and are defined by

their

interfaces. They are usually larger than individual objects, and all

implementation

details are hidden from other components. Component-based software

engineering

is the process of defining, implementing, and integrating or composing

these loosely coupled, independent components into systems.

CBSE has become as an important software development approach for

large-

scale enterprise systems, with demanding performance and security

requirements.

Customers are demanding secure and dependable software that is

delivered and

deployed more quickly. The only way that these demands can be met is to

build soft-

ware by reusing existing components.

The essentials of component-based software engineering are:

1. Independent components that are completely specified by their

interfaces. There

should be a clear separation between the component interface and its

implemen-

tation. This means that one implementation of a component can be

replaced by

another, without the need to change other parts of the system.

2. Component standards that define interfaces and so facilitate the

integration of

components. These standards are embodied in a component model. They

define, at

the very minimum, how component interfaces should be specified and

how com-

ponents communicate. Some models go much further and define interfaces

that

should be implemented by all conformant components. If components

conform to

standards, then their operation is independent of their programming

language.

Components written in different languages can be integrated into the same

system.

3. Middleware that provides software support for component integration.

To

make independent, distributed components work together, you need

middleware support that handles component communications. Middleware

for

component support handles low-level issues efficiently and allows you to

focus on application-related problems. In addition, middleware for

component

support may provide support for resource allocation, transaction

management,

security, and concurrency.

4. A development process that is geared to component-based software

engineer-

ing. You need a development process that allows requirements to evolve,

depending on the functionality of available components.

Component-based development embodies good software engineering

practice. It

often makes sense to design a system using components, even if you have

to develop

466 Chapter 16 Component-based software engineering

Problems with CBSE

CBSE is now a mainstream approach to software engineering and is widely

used when creating new systems.

However, when used as an approach to reuse, problems include

component trustworthiness, component certification, requirements

compromises, and prediction of the properties of components, especially

when they are integrated with other components.

http://software-engineering-book.com/web/cbse-problems/

rather than reuse these components. Underlying CBSE are sound design

principles

that support the construction of understandable and maintainable

software:

1. Components are independent, so they do not interfere with each other’s

opera-

tion. Implementation details are hidden. The component’s implementation

can

be changed without affecting the rest of the system.

2. Components communicate through well-defined interfaces. If these

interfaces

are maintained, one component can be replaced by another component

provid-

ing additional or enhanced functionality.

3. Component infrastructures offer a range of standard services that can be

used in application systems. This reduces the amount of new code that has

to be developed.

The initial motivation for CBSE was the need to support both reuse and

distributed

software engineering. A component was seen as an element of a software

system that

could be accessed, using a remote procedure call mechanism, by other

components run-

ning on separate computers. Each system that reused a component had to

incorporate its own copy of that component. This idea of a component

extended the notion of distributed objects, as defined in distributed

systems models such as the CORBA specification (Pope 1997). Several

different protocols and technology-specific “standards” were

introduced to support this view of a component, including Sun’s Enterprise

Java Beans (EJB), Microsoft’s COM and .NET, and CORBA’s CCM (Lau and

Wang 2007).

Unfortunately, the companies involved in proposing standards could not

agree on

a single standard for components, thereby limiting the impact of this

approach to software reuse. It is impossible for components developed

using different approaches to

work together. Components that are developed for different platforms,

such as .NET

or J2EE, cannot interoperate. Furthermore, the standards and protocols

proposed

were complex and difficult to understand. This was also a barrier to their

adoption.

In response to these problems, the notion of a component as a service was

devel-

oped, and standards were proposed to support service-oriented software

engineering.

The most significant difference between a component as a service and the

original

notion of a component is that services are stand-alone entities that are

external to a program using them. When you build a service-oriented

system, you reference the

external service rather than including a copy of that service in your

system.

Service-oriented software engineering is a type of component-based

software engi-

neering. It uses a simpler notion of a component than that originally

proposed in CBSE,

16.1 Components and component models 467

where components were executable routines that were included in larger

systems. Each system that used a component embedded its own version of

that component. Service-oriented approaches are gradually replacing CBSE

with embedded components as an

approach to systems development. In this chapter, I discuss the use of

CBSE with

embedded components; service-oriented software engineering is covered

in Chapter 18.

16.1 Components and component models

The software reuse community generally agrees that a component is an

independent

software unit that can be composed with other components to create a

software system.

Beyond that, however, people have proposed varying definitions of a

software compo-

nent. Councill and Heineman (Councill and Heineman 2001) define a

component as:

A software element that conforms to a standard component model and can be

independently deployed and composed without modification according to a

composition standard.

This definition is standards-based so that a software unit that conforms to

these standards is a component. Szyperski (Szyperski 2002), however,

does not mention standards in his definition of a component but focuses

instead on the key characteristics of components: A software component is a

unit of composition with contractually-specified

interfaces and explicit context dependencies only. A software component can

be deployed independently and is subject to composition by third parties.

Both of these definitions were developed around the idea of a component

as an

element that is embedded in a system, rather than a service that is

referenced by the system. However, they are equally applicable to service

components.

Szyperski also states that a component has no externally observable state;

that is,

copies of components are indistinguishable. However, some component

models,

such as the Enterprise Java Beans model, allow stateful components, so

these do not

correspond with Szyperski’s definition. While stateless components are

certainly

simpler to use, in some systems stateful components are more convenient

and reduce

system complexity.

What the above definitions have in common is that they agree that

components

are independent and that they are the fundamental unit of composition in

a system. I think that, if we combine these proposals, we get a more

rounded description of a

reusable component. Figure 16.1 shows what I consider to be the essential

character-

istics of a component as used in CBSE.

†Councill, W. T., and G. T. Heineman. 2001. “Definition of a Software

Component and Its Elements.”

In Component-Based Software Engineering, edited by G T Heineman and

W T Councill, 5–20. Boston: Addison Wesley.

‡Szyperski, C. 2002. Component Software: Beyond Object-Oriented

Programming, 2nd Ed. Harlow, UK: Addison Wesley.

468 Chapter 16 Component-based software engineering

Component

characteristic

Description

Composable

For a component to be composable, all external interactions must take

place through

publicly defined interfaces. In addition, it must provide external access to

information about itself, such as its methods and attributes.

Deployable

To be deployable, a component has to be self-contained. It must be able to

operate

as a stand-alone entity on a component platform that provides an

implementation of

the component model. This usually means that the component is binary

and does

not have to be compiled before it is deployed. If a component is

implemented as a

service, it does not have to be deployed by a user of that component.

Rather, it is

deployed by the service provider.

Documented

Components have to be fully documented so that potential users can

decide

whether or not the components meet their needs. The syntax and, ideally,

the

semantics of all component interfaces should be specified.

Independent

A component should be independent—it should be possible to compose

and deploy

it without having to use other specific components. In situations where the

component needs externally provided services, these should be explicitly

set out in a

“requires” interface specification.

Standardized

Component standardization means that a component used in a CBSE

process has to

conform to a standard component model. This model may define

component

interfaces, component metadata, documentation, composition, and

deployment.

Figure 16.1 Component

A useful way of thinking about a component is as a provider of one or

more

characteristics

services, even if the component is embedded rather than implemented as a

service.

When a system needs something to be done, it calls on a component to

provide that

service without caring about where that component is executing or the

programming

language used to develop the component. For example, a component in a

system

used in a public library might provide a search service that allows users to

search the library catalog. A component that converts from one graphical

format to another

(e.g., TIFF to JPEG) provides a data conversion service and so on.

Viewing a component as a service provider emphasizes two critical

characteris-

tics of a reusable component:

1. The component is an independent executable entity that is defined by

its inter-

faces. You don’t need any knowledge of its source code to use it. It can

either be

referenced as an external service or included directly in a program.

2. The services offered by a component are made available through an

interface, and all interactions are through that interface. The component

interface is expressed

in terms of parameterized operations, and its internal state is never

exposed.

In principle, all components have two related interfaces, as shown in

Figure 16.2.

These interfaces reflect the services that the component provides and the

services

that the component requires to operate correctly:

1. The “provides” interface defines the services provided by the

component. This

interface is the component API. It defines the methods that can be called

by a user

16.1 Components and component models 469

Requires interface

Provides interface

Defines the services

Defines the services

that are needed and

Component

that are provided

should be provided

by the component

Figure 16.2 Component by other components

to other components

interfaces

of the component. In a UML component diagram, the “provides” interface

for a

component is indicated by a circle at the end of a line from the component

icon.

2. The “requires” interface specifies the services that other components in

the system must provide if a component is to operate correctly. If these

services are not available, then the component will not work. This does

not compromise the independence

or deployability of a component because the “requires” interface does not

define

how these services should be provided. In the UML, the symbol for a

“requires”

interface is a semicircle at the end of a line from the component icon.

Notice that

“provides” and “requires” interface icons can fit together like a ball and

socket.

To illustrate these interfaces, Figure 16.3 shows a model of a component

that has

been designed to collect and collate information from an array of sensors.

It runs

autonomously to collect data over a period of time and, on request,

provides collated data to a calling component. The “provides” interface

includes methods to add,

remove, start, stop, and test sensors. The report method returns the sensor

data that has been collected, and the listAll method provides information

about the attached

sensors. Although I have not shown them here, these methods have

associated

parameters specifying the sensor identifiers, locations, and so on.

The “requires” interface is used to connect the component to the sensors.

It

assumes that sensors have a data interface, accessed through sensorData,

and a man-

agement interface, accessed through sensorManagement. This interface

has been

designed to connect to different types of sensors so that it does not include

specific sensor operations such as Test and provideReading. Instead, the

commands used by a

specific type of sensor are embedded in a string, which is a parameter to

the opera-

tions in the “requires” interface. Adaptor components parse this parameter

string and translate the embedded commands into the specific control

interface of each type of

sensor. I discuss the use of adaptors later in this chapter, where I show

how the data collector component may be connected to a sensor (Figure

16.12).

Requires interface

Provides interface

addSensor

removeSensor

sensorManagement

startSensor

Data collector

stopSensor

sensorData

testSensor

initialize

Figure 16.3 A model

report

of a data collector

listAll

component

470 Chapter 16 Component-based software engineering

Components and objects

Components are often implemented in object-oriented languages, and, in

some cases, accessing the “provides”

interface of a component is done through method calls. However,

components and object classes are not the same thing. Unlike object

classes, components are independently deployable, do not define types,

are language-independent, and are based on a standard component model.

http://software-engineering-book.com/web/components-and-objects/

Components are accessed using remote procedure calls (RPCs). Each

component

has a unique identifier and, using this name, may be called from another

computer.

The called component uses the same mechanism to access the “required”

compo-

nents that are defined in its interface.

An important difference between a component as an external service and a

com-

ponent as a program element accessed using a remote procedure call is

that services

are completely independent entities. They do not have an explicit

“requires” inter-

face. Of course, they do require other components to support their

operation, but

these are provided internally. Other programs can use services without the

need to

implement any additional support required by the service.

16.1.1 Component models

A component model is a definition of standards for component

implementation, doc-

umentation, and deployment. These standards are for component

developers to

ensure that components can interoperate. They are also for providers of

component

execution infrastructures who provide middleware to support component

operation.

For service components, the most important component model is the Web

Service

models, and for embedded components, widely used models include the

Enterprise

Java Beans (EJB) model and Microsoft’s .NET model (Lau and Wang

2007).

The basic elements of an ideal component model are discussed by

Weinreich and

Sametinger (Weinreich and Sametinger 2001). I summarize these model

elements in

Figure 16.4. This diagram shows that the elements of a component model

define the

component interfaces, the information that you need to use the component

in a pro-

gram, and how a component should be deployed:

1. Interfaces Components are defined by specifying their interfaces. The

component model specifies how the interfaces should be defined and the

elements,

such as operation names, parameters, and exceptions, which should be

included

in the interface definition. The model should also specify the language

used to

define the component interfaces.

For web services, interface specification uses XML-based languages as

discussed in Chapter 18; EJB is Java-specific, so Java is used as the

interface

definition language; in .NET, interfaces are defined using Microsoft’s

Common

16.1 Components and component models 471

Customization

Naming

convention

Composition

Documentation

Interface

Specific

Meta-data

Packaging

Evolution

definition

interfaces

access

support

Usage

Interfaces

Deployment

information

and use

Figure 16.4 Basic

elements of a

Component model

component model

Intermediate Language (CIL). Some component models require specific

inter-

faces that must be defined by a component. These are used to compose the

com-

ponent with the component model infrastructure, which provides

standardized

services such as security and transaction management.

2. Usage In order for components to be distributed and accessed remotely

via RPCs, they need to have a unique name or handle associated with

them. This

has to be globally unique. For example, in EJB, a hierarchical name is

generated

with the root based on an Internet domain name. Services have a unique

URI

(Uniform Resource Identifier).

Component meta-data is data about the component itself, such as

information

about its interfaces and attributes. The meta-data is important because it

allows

users of the component to find out what services are provided and

required.

Component model implementations normally include specific ways (such

as the

use of a reflection interface in Java) to access this component meta-data.

Components are generic entities, and, when deployed, they have to be

config-

ured to fit into an application system. For example, you could configure

the

Data collector component (Figure 16.3) by defining the maximum number

of

sensors in a sensor array. The component model may therefore specify

how the

binary components can be customized for a particular deployment

environment.

3. Deployment The component model includes a specification of how

components should be packaged for deployment as independent,

executable routines.

Because components are independent entities, they have to be packaged

with all

supporting software that is not provided by the component infrastructure,

or is

not defined in a “requires” interface. Deployment information includes

informa-

tion about the contents of a package and its binary organization.

Inevitably, as new requirements emerge, components will have to be

changed or

replaced. The component model may therefore include rules governing

when

and how component replacement is allowed. Finally, the component

model may

define the component documentation that should be produced. This is

used to

find the component and to decide whether it is appropriate.

472 Chapter 16 Component-based software engineering

Support services

Component

Transaction

Resource

management

management

management

Concurrency

Persistence

Security

Platform services

Interface

Addressing

Exception

Component

Figure 16.5 Middleware

definition

management

communications

services defined in a

component model

For components that are executable routines rather than external services,

the

component model defines the services to be provided by the middleware

that

supports the executing components. Weinreich and Sametinger use the

analogy of an

operating system to explain component models. An operating system

provides a set

of generic services that can be used by applications. A component model

implemen-

tation provides comparable shared services for components. Figure 16.5

shows some

of the services that may be provided by an implementation of a

component model.

The services provided by a component model implementation fall into two

categories:

1. Platform services, which enable components to communicate and

interoperate in a distributed environment. These are the fundamental

services that must be

available in all component-based systems.

2. Support services, which are common services that many different

components are likely to require. For example, many components require

authentication to

ensure that the user of component services is authorized. It makes sense to

provide a standard set of middleware services for use by all components.

This

reduces the costs of component development, and potential component

incom-

patibilities can be avoided.

Middleware implements the common component services and provides

interfaces to

them. To make use of the services provided by a component model

infrastructure, you

can think of the components as being deployed in a “container.” A

container is an implementation of the support services plus a definition of

the interfaces that a component must provide to integrate it with the

container. Conceptually, when you add a component to the container, the

component can access the support services and the container can access

the component interfaces. When in use, the component interfaces

themselves are not accessed directly by other components. They are

accessed through a container interface that invokes code to access the

interface of the embedded component.

Containers are large and complex and, when you deploy a component in a

con-

tainer, you get access to all middleware services. However, simple

components may

16.2 CBSE processes 473

not need all of the facilities offered by the supporting middleware. The

approach

taken in web services to common service provision is therefore rather

different. For web services, standards have been defined for common

services such as transaction

management and security, and these standards have been implemented as

program

libraries. If you are implementing a service component, you only use the

common

services that you need.

The services associated with a component model have much in common

with

the facilities provided by object-oriented frameworks, which I discussed in

Chapter 15. Although the services provided may not be as comprehensive,

frame-

work services are often more efficient than container-based services. As a

conse-

quence, some people think that it is best to use frameworks such as

SPRING

(Wheeler and White 2013) for Java development rather than the fully-

featured

component model in EJB.

16.2 CBSE processes

CBSE processes are software processes that support component-based

software

engineering. They take into account the possibilities of reuse and the

different process activities involved in developing and using reusable

components. Figure 16.6

(Kotonya 2003) presents an overview of the processes in CBSE. At the

highest level,

there are two types of CBSE processes:

1. Development for reuse This process is concerned with developing

components or services that will be reused in other applications. It usually

involves generalizing existing components.

2. Development with reuse This process is the process of developing new

applications using existing components and services.

These processes have different objectives and therefore include different

activi-

ties. In the development for reuse process, the objective is to produce one

or more

reusable components. You know the components that you will be working

with, and

you have access to their source code to generalize them. In development

with reuse,

you don’t know what components are available, so you need to discover

these com-

ponents and design your system to make the most effective use of them.

You may

not have access to the component source code.

You can see from Figure 16.6 that the basic processes of CBSE with and

for reuse

have supporting processes that are concerned with component acquisition,

compo-

nent management, and component certification:

1. Component acquisition is the process of acquiring components for reuse

or development into a reusable component. It may involve accessing

locally developed

components or services or finding these components from an external

source.

474 Chapter 16 Component-based software engineering

CBSE processes

Specifier,

CBSE for

CBSE with

Designer,

reuse

reuse

Integrator,

Maintainer

Domain analyst,

Designer,

Librarian,

Implementor,

Component

Vendor,

Maintainer,

acquisition

Broker

Market analyst

Component

Component

External

Librarian

certification

management

source

Local or

external

Component

certifier

repository

Figure 16.6 CBSE

processes

2. Component management is concerned with managing a company’s

reusable

components, ensuring that they are properly catalogued, stored, and made

avail-

able for reuse.

3. Component certification is the process of checking a component and

certifying that it meets its specification.

Components maintained by an organization may be stored in a component

repos-

itory that includes both the components and information about their use.

16.2.1 CBSE for reuse

CBSE for reuse is the process of developing reusable components and

making them

available for reuse through a component management system. The vision

of early

supporters of CBSE (Szyperski 2002) was that a thriving component

marketplace

would develop. There would be specialist component providers and

component ven-

dors who would organize the sale of components from different

developers. Software

developers would buy components to include in a system or pay for

services as they

were used. However, this vision has not been realized. There are relatively

few com-

ponent suppliers, and buying off-the-shelf components is uncommon.

Consequently, CBSE for reuse is mostly used within organizations that

have

made a commitment to reuse-driven software engineering. These

companies have a

base of internally developed components that can be reused. However,

these inter-

nally developed components may not be reusable without change. They

often include

application-specific features and interfaces that are unlikely to be required

in other programs where the component is reused.

16.2 CBSE processes 475

To make components reusable, you have to adapt and extend the

application-

specific components to create more generic and therefore more reusable

versions.

Obviously, this adaptation has an associated cost. You have to decide first,

whether a component is likely to be reused and second, whether the cost

savings from future

reuse justify the costs of making the component reusable.

To answer the first of these questions, you have to decide whether or not

the com-

ponent implements one or more stable domain abstractions. Stable domain

abstrac-

tions are fundamental elements of the application domain that change

slowly. For

example, in a banking system, domain abstractions might include

accounts, account

holders, and statements. In a hospital management system, domain

abstractions might

include patients, treatments, and nurses. These domain abstractions are

sometimes

called business objects. If the component is an implementation of a

commonly used

domain abstraction or group of related business objects, it can probably be

reused.

To answer the question about cost-effectiveness, you have to assess the

costs of

changes that are required to make the component reusable. These costs

are the costs of component documentation and component validation, and

of making the component more

generic. Changes that you may make to a component to make it more

reusable include:

removing application-specific methods;

changing names to make them more general;

adding methods to provide more complete functional coverage;

making exception handling consistent for all methods;

adding a “configuration” interface to allow the component to be

adapted to

different situations of use;

integrating required components to increase independence.

The problem of exception handling is a difficult one. In principle,

components

should not handle exceptions themselves because each application will

have its own

requirements for exception management. Rather, the component should

define what

exceptions can arise and should publish these exceptions as part of the

interface. For example, a simple component implementing a stack data

structure should detect and

publish stack overflow and stack underflow exceptions. In practice,

however, there

are two problems with this process:

1. Publishing all exceptions leads to bloated interfaces that are harder to

under-

stand. This may put off potential users of the component.

2. The operation of the component may depend on local exception

handling, and

changing this may have serious implications for the functionality of the

component.

You therefore have to take a pragmatic approach to component exception

handling.

Common technical exceptions, where recovery is important for the

functioning of the

component, should be handled locally. These exceptions and how they are

handled

476 Chapter 16 Component-based software engineering

should be documented with the component. Other exceptions that are

related to the business function of the component should be passed to the

calling component for handling.

Mili et al. (Mili et al. 2002) discuss ways of estimating the costs of making

a component reusable and the returns from that investment. The benefits

of reusing rather than redeveloping a component are not simply

productivity gains. There are also quality gains, because a reused

component should be more dependable, and time-to-market gains.

These are the increased returns that accrue from deploying the software

more quickly.

Mili et al. present various formulas for estimating these gains, as does the

COCOMO

model, discussed in Chapter 23. However, the parameters of these

formulas are diffi-

cult to estimate accurately, and the formulas must be adapted to local

circumstances, making them difficult to use. I suspect that few software

project managers use these models to estimate the return on investment

from component reusability.

Whether or not a component is reusable depends on its application

domain,

functionality, and generality. If the domain is a general one and the

component

implements standard functionality in that domain, then it is more likely to

be reusable. As you add generality to a component, you increase its

reusability because it can be applied in a wider range of environments.

Unfortunately, this normally means

that the component has more operations and is more complex, which

makes the

component harder to understand and use.

There is, therefore, a trade-off between the reusability and

understandability of a

component. To make a component reusable you have to provide a set of

generic

interfaces with operations that cater to all of the ways in which the

component could be used. Reusability adds complexity and hence reduces

component understandability. This makes it more difficult and time

consuming to decide whether a component

is suitable for reuse. Because of the time involved in understanding a

reusable com-

ponent, it is sometimes more cost-effective to reimplement a simpler

component

with the specific functionality that is required.

A potential source of components is legacy systems. As I discussed in

Chapter 9,

legacy systems are systems that fulfill an important business function but

are written using obsolete software technologies. As a result, it may be

difficult to use them with new systems. However, if you convert these old

systems to components, their functionality can be reused in new

applications.

Of course, these legacy systems do not normally have clearly defined

“requires” and

“provides” interfaces. To make these components reusable, you have to

create a wrapper that defines the component interfaces. The wrapper

hides the complexity of the underlying code and provides an interface for

external components to access services that are provided. Although this

wrapper is a fairly complex piece of software, the cost of wrapper

development may be significantly less than the cost of reimplementing the

legacy system.

Once you have developed and tested a reusable component or service, it

then has

to be managed for future reuse. Management involves deciding how to

classify the

component so that it can be discovered, making the component available

either in a

repository or as a service, maintaining information about the use of the

component,

and keeping track of different component versions. If the component is

open-source,

you may make it available in a public repository such as GitHub or

Sourceforge. If it is intended for use in a company, then you may use an

internal repository system.

16.2 CBSE processes 477

Modify

Outline

Identify candidate

requirements

system

components

according to discovered

requirements

components

Architectural

Compose

Identify candidate

design

components to

components

Figure 16.7 CBSE with

create system

reuse

A company with a reuse program may carry out some form of component

certifi-

cation before the component is made available for reuse. Certification

means that

someone apart from the developer checks the quality of the component.

They test the

component and certify that it has reached an acceptable quality standard,

before it is made available for reuse. However, this process can be

expensive, and so many

companies simply leave testing and quality checking to the component

developers.

16.2.2 CBSE with reuse

The successful reuse of components requires a development process

tailored to

including reusable components in the software being developed. The CBSE

with

reuse process has to include activities that find and integrate reusable

components.

The structure of such a process was discussed in Chapter 2, and Figure

16.7 shows

the principal activities within that process. Some of these activities, such

as the

initial discovery of user requirements, are carried out in the same way as

in other

software processes. However, the essential differences between CBSE with

reuse

and software processes for original software development are as follows:

1. The user requirements are initially developed in outline rather than in

detail, and stakeholders are encouraged to be as flexible as possible in

defining their

requirements. Requirements that are too specific limit the number of

compo-

nents that could meet these requirements. However, unlike incremental

devel-

opment, you need a complete description of the requirements so that you

can

identify as many components as possible for reuse.

2. Requirements are refined and modified early in the process depending

on the

components available. If the user requirements cannot be satisfied from

available

components, you should discuss the related requirements that can be

supported

by the reusable components. Users may be willing to change their minds if

this

means cheaper or quicker system delivery.

3. There is a further component search and design refinement activity

after the system architecture has been designed. Apparently, usable

components may turn out

478 Chapter 16 Component-based software engineering

Component

Component

Component

Figure 16.8 The

search

selection

validation

component

identification process

to be unsuitable or may not work properly with other chosen components.

You

may have to find alternatives to these components. Further requirements

changes

may therefore be necessary, depending on the functionality of these

components.

4. Development is a composition process where the discovered

components are

integrated. This involves integrating the components with the component

model

infrastructure and, often, developing adaptors that reconcile the interfaces

of

incompatible components. Of course, additional functionality may also be

required over and above that provided by reused components.

The architectural design stage is particularly important. Jacobsen et al .

(Jacobsen, Griss, and Jonsson 1997) found that defining a robust

architecture is critical for successful reuse. During the architectural design

activity, you may choose a component

model and implementation platform. However, many companies have a

standard

development platform (e.g., .NET), so the component model is

predetermined. As I

discussed in Chapter 6, you also establish the high-level architecture of the

system at this stage and make decisions about system distribution and

control.

An activity that is unique to the CBSE process is identifying candidate

compo-

nents or services for reuse. This involves a number of subactivities, as

shown in

Figure 16.8. Initially, your focus should be on search and selection. You

need to

convince yourself that components are available to meet your

requirements.

Obviously, you should do some initial checking that the component is

suitable, but

detailed testing may not be required. In the later stage, after the system

architecture has been designed, you should spend more time on

component validation. You need

to be confident that the identified components are really suited to your

application; if not, then you have to repeat the search and selection

processes.

The first step in identifying components is to look for components that are

available within your company or from trusted suppliers. There are few

component vendors, so

you are most likely to be looking for components that have been

developed in your own organization or in the repositories of open-source

software that are available. Software development companies can build

their own database of reusable components without

the risks inherent in using components from external suppliers.

Alternatively, you may decide to search code libraries available on the

web, such as Sourceforge, GitHub, or Google Code, to see if source code

for the component that you need is available.

Once the component search process has identified possible components,

you have to

select candidate components for assessment. In some cases, this will be a

straightforward task. Components on the list will directly implement the

user requirements, and there will not be competing components that

match these requirements. In other cases, however,

the selection process is more complex. There will not be a clear mapping

of requirements onto components. You may find that several components

have to be integrated to meet a

16.2 CBSE processes 479

The Ariane 5 launcher failure

While developing the Ariane 5 space launcher, the designers decided to

reuse the inertial reference software that had performed successfully in the

Ariane 4 launcher. The inertial reference software maintains the stability

of the rocket. The designers decided to reuse this without change (as you

would do with components), although it included additional functionality

that was not required in Ariane 5.

In the first launch of Ariane 5, the inertial navigation software failed, and

the rocket could not be controlled.

The rocket and its payload were destroyed. The cause of the problem was

an unhandled exception when a conversion of a fixed-point number to an

integer resulted in a numeric overflow. This caused the runtime system to

shut down the inertial reference system, and launcher stability could not

be maintained. The fault had never occurred in Ariane 4 because it had

less powerful engines and the value that was converted could not be large

enough for the conversion to overflow.

This illustrates an important problem with software reuse. Software may

be based on assumptions about the context where the system will be used,

and these assumptions may not be valid in a different situation.

More information about this failure is available at: http://software-

engineering-book.com/case-studies/ariane5/

Figure 16.9 An

specific requirement or group of requirements. You therefore have to

decide which of example of validation these component compositions

provide the best coverage of the requirements.

failure with reused

Once you have selected components for possible inclusion in a system, you

should

software

then validate them to check that they behave as advertised. The extent of

the validation required depends on the source of the components. If you

are using a component that

has been developed by a known and trusted source, you may decide that

component

testing is unnecessary. You simply test the component when it is

integrated with other components. On the other hand, if you are using a

component from an unknown source,

you should always check and test that component before including it in

your system.

Component validation involves developing a set of test cases for a

component

(or, possibly, extending test cases supplied with that component) and

developing a

test harness to run component tests. The major problem with component

validation

is that the component specification may not be sufficiently detailed to

allow you to develop a complete set of component tests. Components are

usually specified informally, with the only formal documentation being

their interface specification. This

may not include enough information for you to develop a complete set of

tests that

would convince you that the component’s advertised interface is what you

require.

As well as testing that a component for reuse does what you require, you

may also

have to check that the component does not include malicious code or

functionality

that you don’t need. Professional developers rarely use components from

untrusted

sources, especially if these sources do not provide source code. Therefore,

the malicious code problem does not usually arise. However, reused

components may often

contain functionality that you don’t need, and you have to check that this

functionality will not interfere with your use of the component.

The problem with unnecessary functionality is that it may be activated by

the

component itself. While this may have no effect on the application reusing

the com-

ponent, it can slow down the component, cause it to produce surprising

results or, in exceptional cases, cause serious system failures. Figure 16.9

summarizes a situation where the failure of a reused software system,

which had unnecessary functionality,

led to catastrophic system failure.

480 Chapter 16 Component-based software engineering

The problem in the Ariane 5 launcher arose because the assumptions made

about

the software for Ariane 4 were invalid for Ariane 5. This is a general

problem with

reusable components. They are originally implemented for a specific

application

environment and, naturally, embed assumptions about that environment.

These

assumptions are rarely documented, so when the component is reused, it

is impossi-

ble to develop tests to check if the assumptions are still valid. If you are

reusing a component in a new environment, you may not discover the

embedded environmental assumptions until you use the component in an

operational system.

16.3 Component composition

Component composition is the process of integrating components with

each other,

and with specially written “glue code” to create a system or another

component. You

can compose components in several different ways, as shown in Figure

16.10. From

left to right these diagrams illustrate sequential composition, hierarchical

composition, and additive composition. In the discussion below, I assume

that you are com-

posing two components (A and B) to create a new component:

1. Sequential composition In a sequential composition, you create a new

component from two existing components by calling the existing

components in

sequence. You can think of the composition as a composition of the

“provides

interfaces.” That is, the services offered by component A are called, and

the

results returned by A are then used in the call to the services offered by

compo-

nent B. The components do not call each other in sequential composition

but are

called by the external application. This type of composition may be used

with

embedded or service components.

Some extra glue code may be required to call the component services in

the right

order and to ensure that the results delivered by component A are

compatible

with the inputs expected by component B. The “glue code” transforms

these

outputs to be of the form expected by component B.

2. Hierarchical composition This type of composition occurs when one

component calls directly on the services provided by another component.

That is, component

A calls component B. The called component provides the services that are

required

by the calling component. Therefore, the “provides” interface of the called

com-

ponent must be compatible with the “requires” interface of the calling

component.

Component A calls on component B directly, and, if their interfaces match,

there may be no need for additional code. However, if there is a mismatch

between the “requires” interface of A and the “provides” interface of B,

then

some conversion code may be required. As services do not have a

“requires”

interface, this mode of composition is not used when components are

imple-

mented as services accessed over the web.

16.3 Component composition 481

A

A

A

B

B

B

Figure 16.10 Types of

component composition

(1)

(2)

(3)

3. Additive composition This occurs when two or more components are put

together (added) to create a new component, which combines their

functionality. The “provides” interface and “requires” interface of the new

component are a combination

of the corresponding interfaces in components A and B. The components

are

called separately through the external interface of the composed

component and

may be called in any order. A and B are not dependent and do not call

each other.

This type of composition may be used with embedded or service

components.

You might use all the forms of component composition when creating a

system.

In all cases, you may have to write “glue code” that links the components.

For exam-

ple, for sequential composition, the output of component A typically

becomes the

input to component B. You need intermediate statements that call

component A,

collect the result, and then call component B, with that result as a

parameter. When one component calls another, you may need to introduce

an intermediate component

that ensures that the “provides” interface and the “requires” interface are

compatible.

When you write new components especially for composition, you should

design the

interfaces of these components so that they are compatible with other

components in

the system. You can therefore easily compose these components into a

single unit.

However, when components are developed independently for reuse, you

will often be

faced with interface incompatibilities. This means that the interfaces of

the components that you wish to compose are not the same. Three types of

incompatibility can occur: 1. Parameter incompatibility The operations on

each side of the interface have the same name, but their parameter types

or the number of parameters are different. In

Figure 16.11, the location parameter returned by addressFinder is

incompatible

with the parameters required by the displayMap and printMap methods in

mapDB.

2. Operation incompatibility The names of the operations in the provides

and

“requires” interfaces are different. This is a further incompatibility

between the

components shown in Figure 16.11.

3. Operation incompleteness The “provides” interface of a component is a

subset of the “requires” interface of another component, or vice versa.

482 Chapter 16 Component-based software engineering

string location (string pn)

phoneDatabase (string command)

string owner (string pn)

addressFinder

string propertyType (string pn)

displayMap (string postCode, scale)

mapDB (string command)

mapper

printMap (string postCode, scale)

Figure 16.11

Components with

incompatible interfaces

In all cases, you tackle the problem of incompatibility by writing an

adaptor that

reconciles the interfaces of the two components being reused. An adaptor

compo-

nent converts one interface to another.

The precise form of the adaptor depends on the type of composition.

Sometimes, as

in the next example, the adaptor takes a result from one component and

converts it into a form where it can be used as an input to another. In

other cases, the adaptor may be called by component A as a proxy for

component B. This situation occurs if A wishes

to call B, but the details of the “requires” interface of A do not match the

details of the

“provides” interface of B. The adaptor reconciles these differences by

converting its input parameters from A into the required input parameters

for B. It then calls B to deliver the services required by A.

To illustrate adaptors, consider the two simple components shown in

Figure 16.11,

whose interfaces are incompatible. These might be part of a system used

by the emer-

gency services. When the emergency operator takes a call, the phone

number is input

to the addressFinder component to locate the address. Then, using the

mapper compo-

nent, the operator prints a map to be sent to the vehicle dispatched to the

emergency.

The first component, addressFinder, finds the address that matches a

phone num-

ber. It can also return the owner of the property associated with the phone

number and the type of property. The mapper component takes a post

code (in the United States,

a standard ZIP code with the additional four digits identifying property

location) and displays or prints a street map of the area around that code

at a specified scale.

These components are composable in principle because the property

location

includes the post or ZIP code. However, you have to write an adaptor

component

called postCodeStripper that takes the location data from addressFinder

and strips out the post code. This post code is then used as an input to

mapper, and the street map is displayed at a scale of 1:10,000. The

following code, which is an example of sequential composition, illustrates

the sequence of calls that is required to implement this process: address =

addressFinder.location (phonenumber) ;

postCode = postCodeStripper.getPostCode (address) ;

mapper.displayMap(postCode, 10000) ;

Another case in which an adaptor component may be used is in

hierarchical composi-

tion, where one component wishes to make use of another but there is an

incompatibility

16.3 Component composition 483

sensorManagement

addSensor

start

removeSensor

startSensor

sensor

stop

Adaptor

Data collector

stopSensor

testSensor

sensorData

getdata

initialize

Figure 16.12 An

report

adaptor linking a data

listAll

collector and a sensor

between the “provides” interface and “requires” interface of the

components in the

composition. I have illustrated the use of an adaptor in Figure 16.12

where an adaptor is used to link a data collector and a sensor component.

These could be used in the

implementation of a wilderness weather station system, as discussed in

Chapter 7.

The sensor and data collector components are composed using an adaptor

that

reconciles the “requires” interface of the data collection component with

the “pro-

vides” interface of the sensor component. The data collector component

has been

designed with a generic “requires” interface that supports sensor data

collection and sensor management. For each of these operations, the

parameter is a text string representing the specific sensor commands. For

example, to issue a collect command,

you would say sensorData(“collect”). As I have shown in Figure 16.12, the

sensor

itself has separate operations such as start, stop, and getdata.

The adaptor parses the input string, identifies the command (e.g., collect),

and then calls Sensor.getdata to collect the sensor value. It then returns

the result (as a character string) to the data collector component. This

interface style means that the data collector can interact with different

types of sensor. A separate adaptor, which converts the sensor commands

from Data collector to the sensor interface, is implemented for each type

of sensor.

The above discussion of component composition assumes you can tell

from the

component documentation whether or not interfaces are compatible. Of

course, the

interface definition includes the operation name and parameter types, so

you can make some assessment of the compatibility from this. However,

you depend on the component documentation to decide whether the

interfaces are semantically compatible.

To illustrate this problem, consider the composition shown in Figure

16.13. These

components are used to implement a system that downloads images from

a camera and

stores them in a photograph library. The system user can provide

additional information to describe and catalog the photograph. To avoid

clutter, I have not shown all interface getImage

addItem

Image

adaptor

Manager

Photo

retrieve

Library

catEntry

getCatalogEntry

User

Figure 16.13 Photo

Interface

library composition

484 Chapter 16 Component-based software engineering

— The context keyword names the component to which the

conditions apply

context addItem

— The preconditions specify what must be true before execution of

addItem

pre: PhotoLibrary.libSize() > 0

PhotoLibrary.retrieve(pid) = null

— The postconditions specify what is true after execution

post: libSize () = libSize()@pre + 1

PhotoLibrary.retrieve(pid) = p

PhotoLibrary.catEntry(pid) = photodesc

context delete

pre: PhotoLibrary.retrieve(pid) <> null ;

post: PhotoLibrary.retrieve(pid) = null

PhotoLibrary.catEntry(pid) = PhotoLibrary.catEntry(pid)@pre

PhotoLibrary.libSize() = libSize()@pre—1

Figure 16.14 The

methods here. Rather, I simply show the methods that are needed to

illustrate the com-OCL description of

the Photo Library

ponent documentation problem. The methods in the interface of Photo

Library are:

interface

public void addItem (Identifier pid ; Photograph p; CatalogEntry

photodesc) ;

public Photograph retrieve (Identifier pid) ;

public CatalogEntry catEntry (Identifier pid) ;

Assume that the documentation for the addItem method in Photo Library

is:

This method adds a photograph to the library and associates the photograph

identifier and catalog descriptor with the photograph.

This description appears to explain what the component does, but consider

the

following questions:

What happens if the photograph identifier is already associated with a

photograph in the library?

Is the photograph descriptor associated with the catalog entry as well as

the

photograph? That is, if you delete the photograph, do you also delete the

catalog

information?

There is not enough information in the informal description of addItem to

answer

these questions. Of course, it is possible to add more information to the

natural language description of the method, but in general, the best way

to resolve ambiguities is to use a formal language to describe the interface.

The specification shown in

Figure 16.14 is part of the description of the interface of Photo Library

that adds

information to the informal description.

16.3 Component composition 485

Figure 16.14 shows pre- and postconditions that are defined in a notation

based on

the object constraint language (OCL), which is part of the UML (Warmer

and Kleppe

2003). OCL is designed to describe constraints in UML object models; it

allows you

to express predicates that must always be true, that must be true before a

method has executed; and that must be true after a method has executed.

These are invariants,

preconditions, and postconditions. To access the value of a variable before

an operation, you add @pre after its name. Therefore, using age as an

example:

age = age@pre + 1

This statement means that the value of age after an operation is one more

than it

was before that operation.

OCL-based approaches are primarily used in model-based software

engineering

to add semantic information to UML models. The OCL descriptions may be

used to

drive code generators in model-driven engineering. The general approach

has been

derived from Meyer’s Design by Contract approach (Meyer 1992), in

which the

interfaces and obligations of communicating objects are formally specified

and

enforced by the runtime system. Meyer suggests that using Design by

Contract is

essential if we are to develop trusted components (Meyer 2003).

Figure 16.14 shows the specification for the addItem and delete methods

in Photo

Library. The method being specified is indicated by the keyword context

and the pre- and postconditions by the keywords pre and post. The

preconditions for addItem state that: 1. There must not be a photograph in

the library with the same identifier as the

photograph to be entered.

2. The library must exist—assume that creating a library adds a single

item to it so that the size of a library is always greater than zero.

3. The postconditions for addItem state that:

The size of the library has increased by 1 (so only a single entry has been

made).

If you retrieve using the same identifier, then you get back the photograph

that

you added.

If you look up the catalog using that identifier, you get back the catalog

entry

that you made.

The specification of delete provides further information. The precondition

states

that to delete an item, it must be in the library, and, after deletion, the

photo can no longer be retrieved and the size of the library is reduced by

1. However, delete does not delete the catalog entry—you can still retrieve

it after the photo has been deleted.

The reason for this is that you may wish to maintain information in the

catalog about why a photo was deleted, its new location, and so on.

When you create a system by composing components, you may find that

there

are potential conflicts between functional and non-functional

requirements, the

need to deliver a system as quickly as possible, and the need to create a

system that

486 Chapter 16 Component-based software engineering

(a)

Data

Data

Report

collection

management

generator

Report

Figure 16.15 Data

(b)

Data

collection and report

collection

Database

generation components

Report

can evolve as requirements change. You may have to take trade-offs into

account

for component decisions:

1. What composition of components is most effective for delivering the

functional

requirements for the system?

2. What composition of the components will make it easier to adapt the

composite

component when its requirements change?

3. What will be the emergent properties of the composed system? These

properties

include performance and dependability. You can only assess these

properties

once the complete system is implemented.

Unfortunately, in many situations the solutions to the composition

problems may

conflict. For example, consider a situation such as that illustrated in Figure

16.15, where a system can be created through two alternative

compositions. The system is

a data collection and reporting system where data is collected from

different sources, stored in a database, and then different reports

summarizing that data are produced.

Here, there is a potential conflict between adaptability and performance.

Composition (a) is more adaptable, but composition (b) is likely to be

faster and more reliable. The advantages of composition (a) are that

reporting and data management

are separate, so there is more flexibility for future change. The data

management

system could be replaced, and, if reports are required that the current

reporting component cannot produce, that component can also be

replaced without having to

change the data management component.

In composition (b), a database component with built-in reporting facilities

(e.g.,

Microsoft Access) is used. The key advantage of composition (b) is that

there are fewer components, so this will be a faster implementation

because there are no component

communication overheads. Furthermore, data integrity rules that apply to

the database will also apply to reports. These reports will not be able to

combine data in incorrect ways. In composition (a), there are no such

constraints, so errors in reports could occur.

In general, a good composition principle to follow is the principle of

separation of concerns. That is, you should try to design your system so

that each component has

a clearly defined role. Ideally, component roles should not overlap.

However, it may be cheaper to buy one multifunctional component rather

than two or three separate

components. Furthermore, dependability or performance penalties may be

incurred

when multiple components are used.

16.3 Component

Chapter 16

composition

Further reading 487

K e y P o i n t s

Component-based software engineering is a reuse-based approach to

defining, implementing, and composing loosely coupled independent

components into systems.

A component is a software unit whose functionality and dependencies

are completely defined by a set of public interfaces. Components can be

composed with other components without knowledge of their

implementation and can be deployed as an executable unit.

Components may be implemented as executable routines that are

included in a system or as external services that are referenced from

within a system.

A component model defines a set of standards for components,

including interface standards, usage standards, and deployment standards.

The implementation of the component model provides a set of common

services that may be used by all components.

During the CBSE process, you have to interleave the processes of

requirements engineering and system design. You have to trade off

desirable requirements against the services that are available from existing

reusable components.

Component composition is the process of “wiring” components together

to create a system.

Types of composition include sequential composition, hierarchical

composition, and additive composition.

When composing reusable components that have not been written for

your application, you may need to write adaptors or “glue code” to

reconcile the different component interfaces.

When choosing compositions, you have to consider the required

functionality of the system, the non-functional requirements, and the ease

with which one component can be replaced when the system is changed.

F u r t h E r r E a d i n g

Component Software: Beyond Object-Oriented Programming, 2nd ed. This

updated edition of the first book on CBSE covers technical and

nontechnical issues in CBSE. It has more detail on specific technologies

than Heineman and Councill’s book and includes a thorough discussion of

market issues. (C. Szyperski, Addison-Wesley, 2002).

“Specification, Implementation and Deployment of Components.” A good

introduction to the fundamentals of CBSE. The same issue of the CACM

includes articles on components and component-based development. (I.

Crnkovic, B. Hnich, T. Jonsson, and Z. Kiziltan, Comm. ACM, 45(10) ,

October

2002) http://dx.doi.org/10.1145/570907.570928

“Software Component Models.” This comprehensive discussion of

commercial and research component models classifies these models and

explains the differences between them. (K-K. Lau and Z.

Wang, IEEE Transactions on Software Engineering, 33 (10), October 2007)

http://dx.doi.

org/10.1109/TSE.2007.70726

488

488 Chapter 16 Component-based software

software engineering

“Software Components Beyond Programming: From Routines to Services.”

This is the opening article in a special issue of the magazine that includes

several articles on software components. This article discusses the

evolution of components and how service-oriented components are

replacing executable program routines. (I. Crnkovic, J. Stafford, and C.

Szyperski, IEEE Software, 28 (3), May/

June 2011) http://dx.doi.org/10.1109/MS.2011.62

Object Constraint Language (OCL) Tutorial. A good introduction to the use

of the object-constraint

language. (J. Cabot, 2012) http://modeling-languages.com/ocl-tutorial/

W E B S i t E

PowerPoint slides for this chapter:

www.pearsonglobaleditions.com/Sommerville

Links to supporting videos:

http://software-engineering-book.com/videos/software-reuse/

A more detailed discussion of the Ariane5 accident:

http://software-engineering-book.com/case-studies/ariane5/

E x E r C i S E S

16.1. What are the design principles underlying the CBSE that support the

construction of understandable and maintainable software?

16.2. The principle of component independence means that it ought to be

possible to replace one component with another that is implemented in a

completely different way. Using an example, explain how such component

replacement could have undesired consequences and may lead to system

failure.

16.3. In a reusable component, what are the critical characteristics that

are emphasized when the component is viewed as a service?

16.4. Why is it important that components should be based on a standard

component model?

16.5. Using an example of a component that implements an abstract data

type such as a stack or a list, show why it is usually necessary to extend

and adapt components for reuse.

16.6. What are the essential differences between CBSE with reuse and

software processes for original software development?

16.7. Design the “provides” interface and the “requires” interface of a

reusable component that may be used to represent a patient in the

Mentcare system that I introduced in Chapter 1.

16.3 Component

Chapter 16 composition

References 489

16.8. Using examples, illustrate the different types of adaptor needed to

support sequential composition, hierarchical composition, and additive

composition.

16.9. Design the interfaces of components that might be used in a system

for an emergency control room. You should design interfaces for a call-

logging component that records calls made, and a vehicle discovery

component that, given a post code (zip code) and an incident type, finds

the nearest suitable vehicle to be dispatched to the incident.

16.10. It has been suggested that an independent certification authority

should be established. Vendors would submit their components to this

authority, which would validate that the component was trustworthy.

What would be the advantages and disadvantages of such a certification

authority?

r E F E r E n C E S

Councill, W. T., and G. T. Heineman. 2001. “Definition of a Software

Component and Its Elements.”

In Component-Based Software Engineering, edited by G. T. Heineman and W.

T. Councill, 5–20.

Boston: Addison-Wesley.

Jacobsen, I., M. Griss, and P. Jonsson. 1997. Software Reuse. Reading, MA:

Addison-Wesley.

Kotonya, G. 2003. “The CBSE Process: Issues and Future Visions.” In 2nd

CBSEnet Workshop. Budapest,

Hungary. http://miro.sztaki.hu/projects/cbsenet/budapest/presentations/

Gerald-CBSEProcess.ppt

Lau, K-K., and Z. Wang. 2007. “Software Component Models.” IEEE Trans.

on Software Eng. 33 (10): 709–724. doi:10.1109/TSE.2007.70726.

Meyer, B. 1992. “Applying Design by Contract.” IEEE Computer 25 (10):

40–51. doi:10.1109/2.161279.

. 2003. “The Grand Challenge of Trusted Components.” In Proc. 25th Int.

Conf. on Software Engineering. Portland, OR: IEEE Press. doi:10.1109/

ICSE.2003.1201252.

Mili, H., A. Mili, S. Yacoub, and E. Addy. 2002. Reuse-Based Software

Engineering. New York: John Wiley & Sons.

Pope, A. 1997. The CORBA Reference Guide: Understanding the Common

Object Request Broker Architecture. Harlow, UK: Addison-Wesley.

Szyperski, C. 2002. Component Software: Beyond Object-Oriented

Programming, 2nd ed. Harlow, UK: Addison-Wesley.

Warmer, J., and A. Kleppe. 2003. The Object Constraint Language: Getting

Your Models Ready for MDA. Boston: Addison-Wesley.

Weinreich, R., and J. Sametinger. 2001. “Component Models and

Component Services: Concepts and Principles.” In Component-Based

Software Engineering, edited by G. T. Heineman and W. T. Councill, 33–48.

Boston: Addison-Wesley.

Wheeler, W., and J. White. 2013. Spring in Practice. Greenwich, CT:

Manning Publications.

17

Distributed software

engineering

Objectives

The objective of this chapter is to introduce distributed systems

engineering and distributed systems architectures. When you have

read this chapter, you will:

know the key issues that have to be considered when designing

and implementing distributed software systems;

understand the client–server computing model and the layered

architecture of client–server systems;

have been introduced to commonly used patterns for distributed

systems architectures and know the types of system for which

each architectural pattern is applicable;

understand the notion of software as a service, providing web-

based access to remotely deployed application systems.

Contents

17.1 Distributed systems

17.2 Client–server computing

17.3 Architectural patterns for distributed systems

17.4 Software as a service

Chapter 17 Distributed software engineering 491

Most computer-based systems are now distributed systems. A distributed

system is one involving several computers rather than a single application

running on a single machine.

Even apparently self-contained applications on a PC or laptop, such as

image editors, are distributed systems. They execute on a single computer

system but often rely on remote cloud systems for update, storage, and

other services. Tanenbaum and Van Steen

(Tanenbaum and Van Steen 2007) define a distributed system to be “a

collection of

independent computers that appears to the user as a single coherent

system.Ӡ

When you are designing a distributed system, there are specific issues that

have to

be taken into account simply because the system is distributed. These

issues arise

because different parts of the system are running on independently

managed com-

puters and because the characteristics of the network, such as latency and

reliability, may have to be considered in your design.

Coulouris et al. (Coulouris et al. 2011) identify the five benefits of

developing

systems as distributed systems:

1. Resource sharing A distributed system allows the sharing of hardware

and software resources—such as disks, printers, files, and compilers—that

are associated

with computers on a network.

2. Openness Distributed systems are normally open systems—systems

designed

around standard Internet protocols so that equipment and software from

different

vendors can be combined.

3. Concurrency In a distributed system, several processes may operate at

the same time on separate computers on the network. These processes

may (but need not)

communicate with each other during their normal operation.

4. Scalability In principle at least, distributed systems are scalable in that

the capabilities of the system can be increased by adding new resources to

cope with

new demands on the system. In practice, the network linking the

individual

computers in the system may limit the system scalability.

5. Fault tolerance The availability of several computers and the potential

for replicating information means that distributed systems can be tolerant

of some hardware

and software failures (see Chapter 11). In most distributed systems, a

degraded

service can be provided when failures occur; complete loss of service only

occurs

when there is a network failure.‡

Distributed systems are inherently more complex than centralized systems.

This

makes them more difficult to design, implement, and test. It is harder to

understand the emergent properties of distributed systems because of the

complexity of the interactions between system components and system

infrastructure. For example, rather

than being dependent on the execution speed of one processor, system

performance

†Tanenbaum, A. S., and M. Van Steen. 2007. Distributed Systems:

Principles and Paradigms, 2nd Ed.

Upper Saddle River, NJ: Prentice-Hall.

‡Coulouris, G., J. Dollimore, T. Kindberg, and G. Blair. 2011. Distributed

Systems: Concepts and Design, 5th Edition. Harlow, UK.: Addison Wesley.

492 Chapter 17 Distributed software engineering

depends on network bandwidth, network load, and the speed of other

computers that

are part of the system. Moving resources from one part of the system to

another can

significantly affect the system’s performance.

Furthermore, as all users of the WWW know, distributed systems are

unpredictable in

their response. Response time depends on the overall load on the system,

its architecture, and the network load. As all of these factors may change

over a short time, the time taken to respond to a user request may change

significantly from one request to another.

The most important developments that have affected distributed software

systems in

the past few years are service-oriented systems and the advent of cloud

computing,

delivering infrastructure, platforms, and software as a service. In this

chapter, I focus on general issues of distributed systems, and in Section

17.4 I cover the idea of software as a service. In Chapter 18, I discuss

other aspects of service-oriented software engineering.

17.1 Distributed systems

As I discussed in the introduction to this chapter, distributed systems are

more complex than systems that run on a single processor. This

complexity arises because it is practically impossible to have a top-down

model of control of these systems. The nodes in the system that deliver

functionality are often independent systems that are managed and

controlled by their owners. There is no single authority in charge of the

entire distributed system. The network connecting these nodes is also a

separately managed system. It is a complex system in its own right and

cannot be controlled by the owners of systems

using the network. There is, therefore, an inherent unpredictability in the

operation of distributed systems that has to be taken into account when

you are designing a system.

Some of the most important design issues that have to be considered in

distrib-

uted systems engineering are:

1. Transparency To what extent should the distributed system appear to

the user as a single system? When is it useful for users to understand that

the system is distributed?

2. Openness Should a system be designed using standard protocols that

support interoperability, or should more specialized protocols be used?

Although standard network protocols are now universally used, this is not

the case for higher

levels of interaction, such as service communication.

3. Scalability How can the system be constructed so that it is scalable?

That is, how can the overall system be designed so that its capacity can be

increased in

response to increasing demands made on the system?

4. Security How can usable security policies be defined and implemented

that apply across a set of independently managed systems?

5. Quality of service How should the quality of service that is delivered to

system users be specified, and how should the system be implemented to

deliver an

acceptable quality of service to all users.

6. Failure management How can system failures be detected, contained (so

that they have minimal effects on other components in the system), and

repaired?

17.1 Distributed systems 493

CORBA—Common Object Request Broker Architecture

CORBA was proposed as a specification for a middleware system in the

1990s by the Object Management Group. It was intended as an open

standard that would allow the development of middleware to support

distributed component communications and execution, as well as provide

a set of standard services that could be used by these components.

Several implementations of CORBA were produced, but the system was

not widely adopted. Users preferred proprietary systems such as those

from Microsoft or Oracle, or they moved to service-oriented architectures.

http://software-engineering-book.com/web/corba/

In an ideal world, the fact that a system is distributed would be

transparent to users.

Users would see the system as a single system whose behavior is not

affected by the

way that the system is distributed. In practice, this is impossible to achieve

because there is no central control over the system as a whole. As a result,

individual computers in a system may behave differently at different

times. Furthermore, because it

always takes a finite length of time for signals to travel across a network,

network delays are unavoidable. The length of these delays depends on

the location of resources in the system, the quality of the user’s network

connection, and the network load.

To make a distributed system transparent (i.e., conceal its distributed

nature), you have to hide the underlying distribution. You create

abstractions that hide the system resources so that the location and

implementation of these resources can be changed

without having to change the distributed application. Middleware

(discussed in

Section 17.1.2) is used to map the logical resources referenced by a

program onto the actual physical resources and to manage resource

interactions.

In practice, it is impossible to make a system completely transparent, and

users, generally, are aware that they are dealing with a distributed system.

You may therefore decide that it is best to expose the distribution to users.

They can then be prepared for some of the consequences of distribution

such as network delays and remote node failures.

Open distributed systems are built according to generally accepted

standards.

Components from any supplier can therefore be integrated into the system

and can

interoperate with the other system components. At the networking level,

openness is

now taken for granted, with systems conforming to Internet protocols, but

at the

component level, openness is still not universal. Openness implies that

system com-

ponents can be independently developed in any programming language

and, if these

conform to standards, they will work with other components.

The CORBA standard (Pope 1997), developed in the 1990s, was intended

to be

the universal standard for open distributed systems, However, the CORBA

standard

never achieved a critical mass of adopters. Rather, many companies

preferred to

develop systems using proprietary standards for components from

companies such as

Sun (now Oracle) and Microsoft. These provided better implementations

and support

software and better long-term support for industrial protocols.

Web service standards (discussed in Chapter 18) for service-oriented

architec-

tures were developed to be open standards. However, these standards

have met with

significant resistance because of their perceived inefficiency. Many

developers of

service-based systems have opted instead for so-called RESTful protocols

because

494 Chapter 17 Distributed software engineering

these have an inherently lower overhead than web service protocols. The

use of

RESTful protocols is not standardized.

The scalability of a system reflects its ability to deliver high-quality service

as

demands on the system increase. The three dimensions of scalability are

size, distribution, and manageability.

1. Size It should be possible to add more resources to a system to cope

with increasing numbers of users. Ideally, then, as the number of users

increases, the system

should increase in size automatically to handle the increased number of

users.

2. Distribution It should be possible to geographically disperse the

components of a system without degrading its performance. As new

components are added, it

should not matter where these are located. Large companies can often

make use

of computing resources in their different facilities around the world.

3. Manageability It should be possible to manage a system as it increases in

size, even if parts of the system are located in independent organizations.

This is one

of the most difficult challenges of scale as it involves managers

communicating

and agreeing on management policies. In practice, the manageability of a

sys-

tem is often the factor that limits the extent to which it can be scaled.

Changing the size of a system may involve either scaling up or scaling out.

Scaling

up means replacing resources in the system with more powerful resources.

For exam-

ple, you may increase the memory in a server from 16 Gb to 64 Gb.

Scaling out means

adding more resources to the system (e.g., an extra web server to work

alongside an

existing server). Scaling out is often more cost-effective than scaling up,

especially now that cloud computing makes it easy to add or remove

servers from a system.

However, this only provides performance improvements when concurrent

processing

is possible.

I have discussed general security issues and issues of security engineering

in Part 2 of this book. When a system is distributed, attackers may target

any of the individual system components or the network itself. If a part of

the system is successfully attacked, then the attacker may be able to use

this as a “back door” into other parts of the system.

A distributed system must defend itself against the following types of

attack:

1. Interception, where an attacker intercepts communications between

parts of the system so that there is a loss of confidentiality.

2. Interruption, where system services are attacked and cannot be delivered

as expected. Denial-of-service attacks involve bombarding a node with

illegitimate

service requests so that it cannot deal with valid requests.

3. Modification, where an attacker gains access to the system and changes

data or system services.

4. Fabrication, where an attacker generates information that should not

exist and then uses this information to gain some privileges. For example,

an attacker

may generate a false password entry and use this to gain access to a

system.

17.1 Distributed systems 495

The major difficulty in distributed systems is establishing a security policy

that

can be reliably applied to all of the components in a system. As I discussed

in Chapter 13, a security policy sets out the level of security to be

achieved by a system. Security mechanisms, such as encryption and

authentication, are used to enforce the security

policy. The difficulties in a distributed system arise because different

organizations may own parts of the system. These organizations may have

mutually incompatible

security policies and security mechanisms. Security compromises may

have to be

made in order to allow the systems to work together.

The quality of service (QoS) offered by a distributed system reflects the

system’s

ability to deliver its services dependably and with a response time and

throughput

that are acceptable to its users. Ideally, the QoS requirements should be

specified in advance and the system designed and configured to deliver

that QoS. Unfortunately,

this is not always practicable for two reasons:

1. It may not be cost-effective to design and configure the system to

deliver a high quality of service under peak load. The peak demands may

mean that you need

many extra servers than normal to ensure that response times are

maintained.

This problem has been lessened by the advent of cloud computing where

cloud

servers may be rented from a cloud provider for as long as they are

required. As

demand increases, extra servers can be automatically added.

2. The quality-of-service parameters may be mutually contradictory. For

example,

increased reliability may mean reduced throughput, as checking

procedures are

introduced to ensure that all system inputs are valid.

Quality of service is particularly important when the system is dealing

with time-

critical data such as sound or video streams. In these circumstances, if the

quality of service falls below a threshold value then the sound or video

may become so

degraded that it is impossible to understand. Systems dealing with sound

and video

should include quality of service negotiation and management

components. These

should evaluate the QoS requirements against the available resources and,

if these

are insufficient, negotiate for more resources or for a reduced QoS target.

In a distributed system, it is inevitable that failures will occur, so the

system has to be designed to be resilient to these failures. Failure is so

ubiquitous that one flippant definition of a distributed system suggested

by Leslie Lamport, a prominent distributed systems researcher, is:

You know that you have a distributed system when the crash of a system that

you’ve never heard of stops you getting any work done.

This is even truer now that more and more systems are executing in the

cloud.

Failure management involves applying the fault-tolerance techniques

discussed in

Chapter 11. Distributed systems should therefore include mechanisms for

discover-

ing whether a component of the system has failed, should continue to

deliver as many services as possible in spite of that failure, and, as far as

possible, should automatically

†Leslie Lamport, in Ross J. Anderson, Security Engineering: A Guide to

Building Dependable Distributed Systems (2nd ed.), Wiley (April 14,

2008).

496 Chapter 17 Distributed software engineering

Waiter

Diner

What would you like?

Tomato soup please

And to follow?

Fillet steak

How would you like it cooked?

Rare please

With salad or french fries?

Figure 17.1 Procedural

Salad please

interaction between a

diner and a waiter

etc.

recover from the failure. One important benefit of cloud computing is that

it has

dramatically reduced the cost of providing redundant system components.

17.1.1 Models of interaction

Two fundamental types of interaction may take place between the

computers in a dis-

tributed computing system: procedural interaction and message-based

interaction.

Procedural interaction involves one computer calling on a known service

offered by

some other computer and waiting for that service to be delivered.

Message-based

interaction involves the “sending” computer defining information about

what is

required in a message, which is then sent to another computer. Messages

usually transmit more information in a single interaction than a procedure

call to another machine.

To illustrate the difference between procedural and message-based

interaction,

consider a situation where you are ordering a meal in a restaurant. When

you have a

conversation with the waiter, you are involved in a series of synchronous,

procedural interactions that define your order. You make a request, the

waiter acknowledges

that request, you make another request, which is acknowledged, and so

on. This is

comparable to components interacting in a software system where one

component

calls methods from other components. The waiter writes down your order

along with

the order of other people with you. He or she then passes this order,

which includes details of everything that has been ordered, to the kitchen

to prepare the food.

Essentially, the waiter is passing a message to the kitchen staff, defining

the food to be prepared. This is message-based interaction.

I have illustrated this kind of interaction in Figure 17.1, which shows the

synchronous ordering process as a series of calls, and in Figure 17.2,

which shows a hypothetical XML

message that defines an order made by the table of three people. The

difference between these forms of information exchange is clear. The

waiter takes the order as a series of

17.1 Distributed systems 497

<starter>

<dish name = “soup” type = “tomato” />

<dish name = “soup” type = “fish” />

<dish name = “pigeon salad” />

</starter>

<main course>

<dish name = “steak” type = “sirloin” cooking = “medium” />

<dish name = “steak” type = “fillet” cooking = “rare” />

<dish name = “sea bass”>

</main>

Figure 17.2

<accompaniment>

Message-based

<dish name = “french fries” portions = “2” />

interaction between a

<dish name = “salad” portions = “1” />

waiter and the kitchen

</accompaniment>

staff

interactions, with each interaction defining part of the order. However,

the waiter has a single interaction with the kitchen where the message

defines the complete order.

Procedural communication in a distributed system is usually implemented

using

remote procedure calls (RPCs). In an RPC, components have globally

unique names

(such as a URL). Using that name, a component can call on the services

offered by

another component as if it was a local procedure or method. System

middleware

intercepts this call and passes it on to a remote component. This carries

out the

required computation and, via the middleware, returns the result to the

calling

component. In Java, remote method invocations (RMIs) are remote

procedure calls.

Remote procedure calls require a “stub” for the called procedure to be

accessible on the computer that is initiating the call. This stub defines the

interface of the remote procedure. The stub is called, and it translates the

procedure parameters into a standard representation for transmission to

the remote procedure. Through the middleware, it

then sends the request for execution to the remote procedure. The remote

procedure

uses library functions to convert the parameters into the required format,

carries out the computation, and then returns the results via the “stub”

that is representing the caller.

Message-based interaction normally involves one component creating a

message that

details the services required from another component. This message is sent

to the receiving component via the system middleware. The receiver

parses the message, carries out the computations, and creates a message

for the sending component with the required

results. This is then passed to the middleware for transmission to the

sending component.

A problem with the RPC approach to interaction is that both the caller and

the

callee need to be available at the time of the communication, and they

must know

how to refer to each other. In essence, an RPC has the same requirements

as a local

procedure or method call. By contrast, in a message-based approach,

unavailability

can be tolerated. If the system component that is processing the message is

unavail-

able, the message simply stays in a queue until the receiver comes back

online.

Furthermore, it is not necessary for the sender to know the name of the

message

receiver and vice versa. They simply communicate with the middleware,

which is

responsible for ensuring that messages are passed to the appropriate

system.

498 Chapter 17 Distributed software engineering

Application components

Coordinated

Application components

operation

Information

Middleware

Middleware

exchange and

common services

Operating system

Logical

Operating system

interaction

Physical

Networking

Networking

connectivity

Figure 17.3

Middleware in a

System 1

System 2

distributed system

17.1.2 Middleware

The components in a distributed system may be implemented in different

program-

ming languages and may execute on different types of processors. Models

of data,

information representation, and protocols for communication may all be

different. A

distributed system therefore requires software that can manage these

diverse parts

and ensure that they can communicate and exchange data.

The term middleware is used to refer to this software—it sits in the middle

between the distributed components of the system. This concept is

illustrated in Figure 17.3, which shows that middleware is a layer between

the operating system and application

programs. Middleware is normally implemented as a set of libraries, which

are installed on each distributed computer, plus a runtime system to

manage communications.

Bernstein (Bernstein 1996) describes types of middleware that are

available to

support distributed computing. Middleware is general-purpose software

that is usu-

ally bought off-the-shelf rather than written specially by application

developers.

Examples of middleware include software for managing communications

with data-

bases, transaction managers, data converters, and communication

controllers.

In a distributed system, middleware provides two distinct types of support:

1. Interaction support, where the middleware coordinates interactions

between different components in the system. The middleware provides

location transparency in

that it isn’t necessary for components to know the physical locations of

other components. It may also support parameter conversion if different

programming languages

are used to implement components, event detection, communication, and

so on.

2. The provision of common services, where the middleware provides

reusable

implementations of services that may be required by several components

in the

distributed system. By using these common services, components can

easily

interoperate and provide user services in a consistent way.

I have already given examples of the interaction support that middleware

can pro-

vide in Section 17.1.1. You use middleware to support remote procedure

and remote

method calls, message exchange, and so forth.

17.2 Client–server computing 499

Common services are those services that may be required by different

compo-

nents irrespective of the functionality of these components. As I discussed

in Chapter 16, these may include security services (authentication and

authorization), notification and naming services, and transaction

management services. For distributed

components, you can think of these common services as being provided by

a mid-

dleware container; for services, they are provided through shared libraries.

You then deploy your component, and it can access and use these common

services.

17.2 Client–server computing

Distributed systems that are accessed over the Internet are organized as

client–server systems. In a client–server system, the user interacts with a

program running on their local computer, such as a web browser or app

on a mobile device. This interacts with another program running on a

remote computer, such as a web server. The remote

computer provides services, such as access to web pages, which are

available to

external clients. This client–server model, as I discussed in Chapter 6, is a

general architectural model of an application. It is not restricted to

applications distributed across several machines. You can also use it as a

logical interaction model where the client and the server run on the same

computer.

In a client–server architecture, an application is modeled as a set of

services that are provided by servers. Clients may access these services and

present results to end-users.

Clients need to be aware of the servers that are available but don’t have to

know anything about other clients. Clients and servers are separate

processes, as shown in Figure 17.4. This figure illustrates a situation in

which there are four servers (s1–s4) that deliver different services. Each

service has a set of associated clients that access these services.

Figure 17.4 shows client and server processes rather than processors. It is

normal

for several client processes to run on a single processor. For example, on

your PC,

you may run a mail client that downloads mail from a remote mail server.

You may

also run a web browser that interacts with a remote web server and a

print client that sends documents to a remote printer. Figure 17.5 shows a

possible arrangement

where the 12 logical clients shown in Figure 17.4 are running on six

computers. The

four server processes are mapped onto two physical server computers.

Several different server processes may run on the same processor, but,

often,

servers are implemented as multiprocessor systems in which a separate

instance of

the server process runs on each machine. Load-balancing software

distributes

requests for service from clients to different servers so that each server

does the

same amount of work. This allows a higher volume of transactions with

clients to be

handled, without degrading the response to individual clients.

Client–server systems depend on there being a clear separation between

the pres-

entation of information and the computations that create and process that

informa-

tion. Consequently, you should design the architecture of distributed

client–server

systems so that they are structured into several logical layers, with clear

interfaces

500 Chapter 17 Distributed software engineering

c2

c3

c4

c12

c11

Server process

s1

s4

c1

c10

c5

Client process

s2

s3

c9

c6

c7

c8

Figure 17.4 Client–

server interaction

between these layers. This allows each layer to be distributed to a

different computer.

Figure 17.6 illustrates this model, showing an application structured into

four layers: 1. A presentation layer that is concerned with presenting

information to the user and managing all user interaction.

2. A data-handling layer that manages the data that is passed to and from

the client.

This layer may implement checks on the data, generate web pages, and so

on.

3. An application processing layer that is concerned with implementing the

logic of the application and so providing the required functionality to end-

users.

4. A database layer that stores the data and provides transaction

management and query services.

The following section explains how different client–server architectures

distrib-

ute these logical layers in different ways. The client–server model also

underlies the notion of software as a service (SaaS), an important way of

deploying software and

accessing it over the Internet. I cover this topic in Section 17.4.

s1, s2

c1

c2

c3, c4

CC1

SC2

CC2

CC3

Server

computer

Network

c5, c6, c7

c8, c9

c10, c11, c12

CC4

SC1

CC5

CC6

Figure 17.5 Mapping

Client

computer

of clients and servers

to networked computers

s3, s4

17.3 Architectural patterns for distributed systems 501

Presentation

Data handling

Application processing

Figure 17.6 Layered

architectural model for

Database

client–server application

17.3 Architectural patterns for distributed systems

As I explained in the introduction to this chapter, designers of distributed

systems have to organize their system designs to find a balance between

performance, dependability, security, and manageability of the system.

Because no universal model of

system organization is appropriate for all circumstances, various

distributed architectural styles have emerged. When designing a

distributed application, you should

choose an architectural style that supports the critical non-functional

requirements of your system.

In this section, I discuss five architectural styles:

1. Master-slave architecture, which is used in real-time systems in which

guaranteed interaction response times are required.

2. Two-tier client–server architecture, which is used for simple client–server

systems and in situations where it is important to centralize the system for

security reasons.

3. Multi-tier client–server architecture, which is used when the server has to

process a high volume of transactions.

4. Distributed component architecture, which is used when resources from

different systems and databases need to be combined, or as an

implementation model

for multi-tier client–server systems.

5. Peer-to-peer architecture, which is used when clients exchange locally

stored information and the role of the server is to introduce clients to each

other. It

may also be used when a large number of independent computations may

have

to be made.

17.3.1 Master–slave architectures

Master–slave architectures for distributed systems are commonly used in

real-

time systems. In those systems, there may be separate processors

associated with

data acquisition from the system’s environment, data processing and

computation,

502 Chapter 17 Distributed software engineering

Control room

processor

Sensor

Coordination

Traffic light control

processor

and display

processor

process

Sensor

Light

control

Master

control

process

process

Slave

Slave

Operator consoles

Traffic flow sensors and

Traffic lights

cameras

Figure 17.7 A traffic

management system

with a master–

and actuator management. Actuators, as I discuss in Chapter 21, are

devices con-

slave architecture

trolled by the software system that act to change the system’s

environment. For

example, an actuator may control a valve and change its state from “open”

to

“closed.” The “master” process is usually responsible for computation,

coordina-

tion, and communications, and it controls the “slave” processes. “Slave”

pro-

cesses are dedicated to specific actions, such as the acquisition of data

from an

array of sensors.

Figure 17.7 shows an example of this architectural model. A traffic control

sys-

tem in a city has three logical processes that run on separate processors.

The master process is the control room process, which communicates with

separate slave processes that are responsible for collecting traffic data and

managing the operation of traffic lights.

A set of distributed sensors collects information on the traffic flow. The

sensor

control process polls the sensors periodically to capture the traffic flow

informa-

tion and collates this information for further processing. The sensor

processor is

itself polled periodically for information by the master process that is

concerned

with displaying traffic status to operators, computing traffic light

sequences, and

accepting operator commands to modify these sequences. The control

room sys-

tem sends commands to a traffic light control process that converts these

into sig-

nals to control the traffic light hardware. The master control room system

is itself organized as a client–server system, with the client processes

running on the operator’s consoles.

You use this master–slave model of a distributed system in situations

where you

can predict the distributed processing that is required and where

processing can be

easily localized to slave processors. This situation is common in real-time

systems, where it is important to meet processing deadlines. Slave

processors can be used for computationally intensive operations, such as

signal processing and the management

of equipment controlled by the system.

17.3 Architectural patterns for distributed systems 503

Presentation

Server

Thin-client

Database

model

Client

Data management

Application processing

Presentation

Application processing

Server

Fat-client

model

Client

Database

Figure 17.8 Thin- and

Data management

fat-client architectural

models

17.3.2 Two-tier client–server architectures

In Section 17.2, I explained the general organization of client–server

systems in which part of the application system runs on the user’s

computer (the client), and part runs on a remote computer (the server). I

also presented a layered application model

(Figure 17.6) where the different layers in the system may execute on

different

computers.

A two-tier client–server architecture is the simplest form of client–server

archi-

tecture. The system is implemented as a single logical server plus an

indefinite number of clients that use that server. This is illustrated in

Figure 17.8, which shows two forms of this architectural model:

1. A thin-client model, where the presentation layer is implemented on the

client and all other layers (data handling, application processing, and

database) are

implemented on a server. The client presentation software is usually a web

browser, but apps for mobile devices may also be available.

2. A fat-client model, where some or all of the application processing is

carried out on the client. Data management and database functions are

implemented on the

server. In this case, the client software may be a specially written program

that

is tightly integrated with the server application.

The advantage of the thin-client model is that it is simple to manage the

clients.

This becomes a major issue when there are a large number of clients, as it

may be

difficult and expensive to install new software on all of them. If a web

browser is

used as the client, there is no need to install any software.

The disadvantage of the thin-client approach, however, is that it places a

heavy

processing load on both the server and the network. The server is

responsible for all computation, which may lead to the generation of

significant network traffic between the client and the server.

Implementing a system using this model may therefore

require additional investment in network and server capacity.

The fat-client model makes use of available processing power on the

computer

running the client software, and distributes some or all of the application

processing

504 Chapter 17 Distributed software engineering

ATM

ATM

Account server

Tele-

Customer

processing

account

monitor

database

ATM

Figure 17.9 A fat-client

ATM

architecture for an

ATM system

and the presentation to the client. The server is essentially a transaction

server that manages all database transactions. Data handling is

straightforward as there is no

need to manage the interaction between the client and the application

processing

system. The fat-client model requires system management to deploy and

maintain

the software on the client computer.

An example of a situation in which a fat-client architecture is used is in a

bank

ATM system, which delivers cash and other banking services to users. The

ATM is the

client computer, and the server is, typically, a mainframe running the

customer account database. A mainframe computer is a powerful machine

that is designed for transaction processing. It can therefore handle the

large volume of transactions generated by ATMs, other teller systems, and

online banking. The software in the teller machine

carries out a lot of the customer-related processing associated with a

transaction.

Figure 17.9 shows a simplified version of the ATM system organization.

The ATMs

do not connect directly to the customer database, but rather to a

teleprocessing (TP) monitor. A TP monitor is a middleware system that

organizes communications with remote

clients and serializes client transactions for processing by the database.

This ensures that transactions are independent and do not interfere with

one other. Using serial transactions means that the system can recover

from faults without corrupting the system data.

While a fat-client model distributes processing more effectively than a

thin-client

model, system management is more complex if a special-purpose client,

rather than

a browser, is used. Application functionality is spread across many

computers. When

the application software has to be changed, this involves software

reinstallation on every client computer. This can be a major cost if there

are hundreds of clients in the system. Auto-update of the client software

can reduce these costs but introduces its own problems if the client

functionality is changed. The new functionality may mean

that businesses have to change the ways they use the system.

The extensive use of mobile devices means that it is important to

mimimize net-

work traffic wherever possible. These devices now include powerful

computers that

can carry out local processing. As a consequence, the distinction between

thin-client and fat-client architectures has become blurred. Apps can have

inbuilt functionality that carries out local processing, and web pages may

include Javascript components

17.3 Architectural patterns for distributed systems 505

Tier 1. Presentation

Client

HTTPS interaction

Web server

Database server

Client

SQL query

Customer

Account service

SQL

account

provision

database

Client

Tier 2. Application

Tier 3. Database

processing and data

processing

Figure 17.10 Three-tier

handling

architecture for an

Client

Internet banking

system

that execute on the user’s local computer. The update problem for apps

remains an

issue, but it has been addressed, to some extent, by automatically updating

apps without explicit user intervention. Consequently, while it is

sometimes helpful to use

these models as a general basis for the architecture of a distributed

system, in practice few web-based applications implement all processing

on the remote server.

17.3.3 Multi-tier client–server architectures

The fundamental problem with a two-tier client–server approach is that

the logical layers in the system—presentation, application processing, data

management, and database—

must be mapped onto two computer systems: the client and the server.

This may lead to problems with scalability and performance if the thin-

client model is chosen, or problems of system management if the fat-client

model is used. To avoid some of these problems, a “multi-tier client–

server” architecture can be used. In this architecture, the different layers

of the system, namely presentation, data management, application

processing, and database, are separate processes that may execute on

different processors.

An Internet banking system (Figure 17.10) is an example of a multi-tier

client–

server architecture, where there are three tiers in the system. The bank’s

customer

database (usually hosted on a mainframe computer as discussed above)

provides

database services. A web server provides data management services such

as web

page generation and some application services. Application services such

as facili-

ties to transfer cash, generate statements, pay bills, and so on are

implemented in the web server and as scripts that are executed by the

client. The user’s own computer

with an Internet browser is the client. This system is scalable because it is

relatively easy to add servers (scale out) as the number of customers

increase.

In this case, the use of a three-tier architecture allows the information

transfer

between the web server and the database server to be optimized. Efficient

middle-

ware that supports database queries in SQL (Structured Query Language)

is used to

handle information retrieval from the database.

506 Chapter 17 Distributed software engineering

Architecture

Applications

Two-tier client–server

Legacy system applications that are used when separating application

architecture with thin clients

processing and data handling is impractical. Clients may access these as

services, as discussed in Section 17.4.

Computationally intensive applications such as compilers with little or no

requirements for data handling.

Data-intensive applications (browsing and querying) with non-intensive

application processing. Simple web browsing is the most common

example of a situation where this architecture is used.

Two-tier client–server

Applications where application processing is provided by off-the-shelf

architecture with fat clients

software (e.g., Microsoft Excel) on the client.

Applications where computationally intensive processing of data (e.g.,

data visualization) is required.

Mobile applications where internet connectivity cannot be guaranteed.

Local processing using cached information from the database is therefore

possible.

Multi-tier client–server

Large-scale applications with hundreds or thousands of clients.

architecture

Applications where both the data and the application are volatile.

Applications where data from multiple sources are integrated.

Figure 17.11 Use of

client–server

architectural patterns

The three-tier client–server model can be extended to a multi-tier variant,

where

additional servers are added to the system. This may involve using a web

server for

data management and separate servers for application processing and

database ser-

vices. Multi-tier systems may also be used when applications need to

access and use

data from different databases. In this case, you may need to add an

integration server to the system. The integration server collects the

distributed data and presents it to the application server as if it were from

a single database. As I discuss in the following section, distributed

component architectures may be used to implement multi-

tier client–server systems.

Multi-tier client–server systems that distribute application processing

across sev-

eral servers are more scalable than two-tier architectures. The tiers in the

system can be independently managed, with additional servers added as

the load increases.

Processing may be distributed between the application logic and the data-

handling

servers, thus leading to more rapid response to client requests.

Designers of client–server architectures must take a number of factors into

account

when choosing the most appropriate distribution architecture. Situations

in which the client–server architectures discussed here may be used are

described in Figure 17.11.

17.3.4 Distributed component architectures

By organizing processing into layers, as shown in Figure 17.6, each layer

of a system can be implemented as a separate logical server. This model

works well for many

types of application. However, it limits the flexibility of system designers

in that they

17.3 Architectural patterns for distributed systems 507

Comp 1

Comp 2

Comp 3

Comp 4

Common

Common

Common

Common

services

services

services

services

Communication middleware

Figure 17.12 A

Client

Client

Client

Client

Client

distributed component

architecture

have to decide what services should be included in each layer. In practice,

however, it is not always clear whether a service is a data management

service, an application

service, or a database service. Designers must also plan for scalability and

so provide some means for servers to be replicated as more clients are

added to the system.

A more general approach to distributed system design is to design the

system as a

set of services, without attempting to allocate these services to layers in

the system.

Each service, or group of related services, can be implemented using a

separate object or component. In a distributed component architecture

(Figure 17.12), the system is

organized as a set of interacting components as I discussed in Chapter 16.

These components provide an interface to a set of services that they

provide. Other components call on these services through middleware,

using remote procedure or method calls.

Distributed component systems are reliant on middleware. This manages

component

interactions, reconciles differences between types of the parameters passed

between

components, and provides a set of common services that application

components can

use. The CORBA standard (Orfali, Harkey, and Edwards 1997) defined

middleware

for distributed component systems, but CORBA implementations have

never been

widely adopted. Enterprises preferred to use proprietary software such as

Enterprise Java Beans (EJB) or .NET.

Using a distributed component model for implementing distributed

systems has a

number of benefits:

1. It allows the system designer to delay decisions on where and how

services

should be provided. Service-providing components may execute on any

node of

the network. There is no need to decide in advance whether a service is

part of a

data management layer, an application layer, or a user interface layer.

2. It is a very open-system architecture that allows new resources to be

added as

required. New system services can be added easily without major

disruption to

the existing system.

3. The system is flexible and scalable. New objects or replicated objects

can be added as the load on the system increases, without disrupting other

parts of the system.

508 Chapter 17 Distributed software engineering

Database 1

Report gen.

Integrator 1

Database 2

Visualizer

Integrator 2

Database 3

Display

Figure 17.13 A

distributed component

architecture for a

data-mining system

Clients

4. It is possible to reconfigure the system dynamically with components

migrating across the network as required. This may be important where

there are fluctuating patterns

of demand on services. A service-providing component can migrate to the

same

processor as service-requesting objects, thus improving the performance of

the system.

A distributed component architecture can be used as a logical model that

allows

you to structure and organize the system. In this case, you think about

how to pro-

vide application functionality solely in terms of services and combinations

of ser-

vices. You then work out how to implement these services. For example, a

retail

application may have application components concerned with stock

control, cus-

tomer communications, goods ordering, and so on.

Data-mining systems are a good example of a type of system that can be

imple-

mented using a distributed component architecture. Data-mining systems

look for

relationships between the data that may be distributed across databases

(Figure 17.13).

These systems pull in information from several separate databases, carry

out compu-

tationally intensive processing, and present easy-to-understand

visualizations of the relationships that have been discovered.

An example of such a data-mining application might be a system for a

retail busi-

ness that sells food and books. Retail businesses maintain separate

databases with

detailed information about food products and books. They use a loyalty

card system

to keep track of customers’ purchases, so there is a large database linking

bar codes of products with customer information. The marketing

department wants to find

relationships between a customer’s food and book purchases. For instance,

a rela-

tively high proportion of people who buy pizzas might also buy crime

novels. With

this knowledge, the business can specifically target customers who make

specific

food purchases with information about new novels when they are

published.

17.3 Architectural patterns for distributed systems 509

In this example, each sales database can be encapsulated as a distributed

compo-

nent with an interface that provides read-only access to its data. Integrator

components are each concerned with specific types of relationships, and

they collect information from all of the databases to try to deduce the

relationships. There might be an integrator component that is concerned

with seasonal variations in goods sold, and another

integrator that is concerned with relationships between different types of

goods.

Visualizer components interact with integrator components to create a

visualization

or a report on the relationships that have been discovered. Because of the

large volumes of data that are handled, visualizer components normally

present their results

graphically. Finally, a display component may be responsible for

delivering the

graphical models to clients for final presentation in their web browser.

A distributed component architecture rather than a layered architecture is

appro-

priate for this type of application because you can add new databases to

the system

without major disruption. Each new database is simply accessed by adding

another

distributed component. The database access components provide a

simplified inter-

face that controls access to the data. The databases that are accessed may

reside on different machines. The architecture also makes it easy to mine

new types of relationships by adding new integrator objects.

Distributed component architectures suffer from two major disadvantages:

1. They are more complex to design than client–server systems. Multilayer

client–

server systems appear to be a fairly intuitive way to think about systems.

They

reflect many human transactions where people request and receive

services

from other people who specialize in providing these services. The

complexity of

distributed component architectures increases the costs of implementation.

2. There are no universal standards for distributed component models or

middle-

ware. Rather, different vendors, such as Microsoft and Sun, developed

different,

incompatible middleware. This middleware is complex, and reliance on it

sig-

nificantly increases the complexity of distributed component systems.

As a result of these problems, distributed component architectures are

being replaced by service-oriented systems (discussed in Chapter 18).

However, distributed component systems have performance benefits over

service-oriented systems. RPC communi-

cations are usually faster than the message-based interaction used in

service-oriented systems. Distributed component architectures are

therefore still used for high-throughput systems in which large numbers of

transactions have to be processed quickly.

17.3.5 Peer-to-peer architectures

The client–server model of computing that I have discussed in previous

sections of the chapter makes a clear distinction between servers, which

are providers of services, and clients, which are receivers of services. This

model usually leads to an uneven distribution of load on the system,

where servers do more work than clients. This may lead to organizations

spending a lot on server capacity while there is unused processing

capacity on the hundreds or thousands of PCs and mobile devices used to

access the system servers.

510 Chapter 17 Distributed software engineering

Peer-to-peer (p2p) systems (Oram 2001) are decentralized systems in

which com-

putations may be carried out by any node on the network. In principle at

least, no

distinctions are made between clients and servers. In peer-to-peer

applications, the overall system is designed to take advantage of the

computational power and storage

available across a potentially huge network of computers. The standards

and proto-

cols that enable communications across the nodes are embedded in the

application

itself, and each node must run a copy of that application.

Peer-to-peer technologies have mostly been used for personal rather than

busi-

ness systems. The fact that there are no central servers means that these

systems are harder to monitor; therefore, a higher level of communication

privacy is possible.

For example, file-sharing systems based on the BitTorrent protocol are

widely used

to exchange files on users’ PCs. Private instant messaging systems, such as

ICQ and

Jabber, provide direct communications between users without an

intermediate server.

Bitcoin is a peer-to-peer payments system using the Bitcoin electronic

currency. Freenet is a decentralized database that has been designed to

make it easier to publish information anonymously and to make it difficult

for authorities to suppress this information.

Other p2p systems have been developed where privacy is not the principal

requirement. Voice over IP (VoIP) phone services, such as Viber, rely on

peer-to-

peer communication between the parties involved in the phone call or

conference.

SETI@home is a long-running project that processes data from radio

telescopes on

home PCs in order to search for indications of extraterrestrial life. In these

systems, the advantage of the p2p model is that a central server is not a

processing bottleneck.

Peer-to-peer systems have also been used by businesses to harness the

power in

their PC networks (McDougall 2000). Intel and Boeing have both

implemented p2p

systems for computationally intensive applications. Such systems take

advantage of

unused processing capacity on local computers. Instead of buying

expensive high-

performance hardware, engineering computations can be run overnight

when desk-

top computers are unused. Businesses also make extensive use of

commercial p2p

systems, such as messaging and VoIP systems.

In principle, every node in a p2p network could be aware of every other

node.

Nodes could connect to and exchange data directly with any other node in

the network.

In practice, this is impossible unless the network has only a few members.

Consequently, nodes are usually organized into “localities,” with some

nodes acting as bridges to

other node localities. Figure 17.14 shows this decentralized p2p

architecture.

In a decentralized architecture, the nodes in the network are not simply

functional

elements but are also communications switches that can route data and

control sig-

nals from one node to another. For example, assume that Figure 17.14

represents a

decentralized, document-management system. A consortium of researchers

uses this

system to share documents. Each member of the consortium maintains his

or her

own document store. However, when a document is retrieved, the node

retrieving

that document also makes it available to other nodes.

If someone needs a document that is stored somewhere on the network,

they issue

a search command, which is sent to nodes in their “locality.” These nodes

check

whether they have the document and, if so, return it to the requestor. If

they do not have it, they route the search to other nodes. Therefore if n1

issues a search for a

17.3 Architectural patterns for distributed systems 511

n4

n6

n8

n13

n7

n12

n2

n3

n14

n10

n11

n9

n1

n5

Figure 17.14

document that is stored at n10, this search is routed through nodes n3, n6,

and n9 to A decentralized

n10. When the document is finally discovered, the node holding the

document then

p2p architecture

sends it to the requesting node directly by making a peer-to-peer

connection.

This decentralized architecture has the advantage of being highly

redundant and

hence both fault-tolerant and tolerant of nodes disconnecting from the

network.

However, the disadvantages are that many different nodes may process

the same

search, and there is also significant overhead in replicated peer

communications.

An alternative p2p architectural model, which departs from a pure p2p

architec-

ture, is a semicentralized architecture where, within the network, one or

more nodes act as servers to facilitate node communications. This reduces

the amount of traffic between nodes. Figure 17.15 illustrates how this

semicentralized architectural model differs from the completely

decentralized model shown in Figure 17.14.

In a semicentralized architecture, the role of the server (sometimes called

a super-

peer) is to help establish contact between peers in the network or to

coordinate the results of a computation. For example, if Figure 17.15

represents an instant messaging system, then network nodes communicate

with the server (indicated by dashed lines)

to find out what other nodes are available. Once these nodes are

discovered, direct

communications can be established and the connection to the server

becomes unnec-

essary. Therefore, nodes n2, n3, n5, and n6 are in direct communication.

In a computational p2p system, where a processor-intensive computation

is distributed across a large number of nodes, it is normal for some nodes

to be superpeers. Their role is to distribute work to other nodes and to

collate and check the results of the computation.

The peer-to-peer architectural model may be the best model for a

distributed sys-

tem in two circumstances:

1. Where the system is computationally-intensive and it is possible to

separate the processing required into a large number of independent

computations. For example, a peer-to-peer system that supports

computational drug discovery distributes

computations that look for potential cancer treatments by analyzing a

huge num-

ber of molecules to see if they have the characteristics required to

suppress the

growth of cancers. Each molecule can be considered separately, so there is

no

need for the peers in the system to communicate.

512 Chapter 17 Distributed software engineering

Discovery server

(Super peer)

n4

n1

n3

n6

n5

Figure 17.15 A

n2

semicentralized p2p

architecture

2. Where the system primarily involves the exchange of information

between indi-

vidual computers on a network and there is no need for this information

to be

centrally stored or managed. Examples of such applications include file-

sharing

systems that allow peers to exchange local files such as music and video

files, and

phone systems that support voice and video communications between

computers.

Peer-to-peer architectures allow for the efficient use of capacity across a

network.

However, security concerns are the principal reason why these systems

have not become more widely used, especially in business (Wallach 2003).

The lack of centralized management means that attackers can set up

malicious nodes that deliver spam and malware to legitimate p2p system

users. Peer-to-peer communications involve opening your

computer to direct interactions with other peers and this means that these

systems could potentially access any of your resources. To counter this

possibility, you need to

organize your system so that these resources are protected. If this is done

incorrectly, then your system is insecure and vulnerable to external

corruption.

17.4 Software as a service

In the previous sections, I discussed client–server models and how

functionality may be distributed between the client and the server. To

implement a client–server system, you may have to install a program or an

app on the client computer, which communicates with the server,

implements client-side functionality, and manages the user

interface. For example, a mail client, such as Outlook or Mac Mail,

provides mail

management features on your own computer. This avoids the problem of

server over-

load in thin-client systems, where all of the processing is carried out at the

server.

The problems of server overload can be significantly reduced by using web

tech-

nologies such as AJAX (Holdener, 2008) and HTML5 (Sarris 2013). These

technologies

support efficient management of web page presentation and local

computation by exe-

cuting scripts that are part of the web page. This means that a browser can

be configured and used as client, with significant local processing. The

application software can be

17.4 Software as a service 513

thought of as a remote service, which can be accessed from any device

that can run a standard browser. Widely used examples of SaaS include

web-based mail systems,

such as Yahoo and Gmail, and office applications, such as Google Docs

and Office 365.

This idea of software as a service (SaaS) involves hosting the software

remotely and providing access to it over the Internet. The key elements of

SaaS are as follows:

1. Software is deployed on a server (or more commonly in the cloud) and

is

accessed through a web browser. It is not deployed on a local PC.

2. The software is owned and managed by a software provider rather than

the

organizations using the software.

3. Users may pay for the software according to how much use they make

of it or

through an annual or monthly subscription. Sometimes the software is free

for

anyone to use, but users must then agree to accept advertisements, which

fund

the software service.

The development of SaaS has accelerated over the past few years as cloud

com-

puting has become widely used. When a service is deployed in the cloud,

the number

of servers can quickly change to match the user demands for that service.

There is no need for service providers to provision for peak loads; as a

result, the costs for these providers have been dramatically reduced.

For software purchasers, the benefit of SaaS is that the costs of

management of

software are transferred to the provider. The provider is responsible for

fixing

bugs and installing software upgrades, dealing with changes to the

operating sys-

tem platform, and ensuring that hardware capacity can meet demand.

Software

license management costs are zero. If someone has several computers,

there is no

need to license software for all of these. If a software application is only

used

occasionally, the pay-per-use model may be cheaper than buying an

application.

The software may be accessed from mobile devices, such as smartphones,

from

anywhere in the world.

The main problem that inhibits the use of SaaS is data transfer with the

remote

service. Data transfer takes place at network speeds, and so transferring a

large

amount of data, such as video or high-quality images takes a lot of time.

You may

also have to pay the service provider according to the amount transferred.

Other

problems are lack of control over software evolution (the provider may

change the

software when it wishes) and problems with laws and regulations. Many

countries

have laws governing the storage, management, preservation, and

accessibility of

data, and moving data to a remote service may breach these laws.

The notion of software as a service and service-oriented architectures

(SOA),

discussed in Chapter 18, are related, but they are not the same:

1. Software as a service is a way of providing functionality on a remote

server with client access through a web browser. The server maintains the

user’s data and

state during an interaction session. Transactions are usually long

transactions,

for example, editing a document.

514 Chapter 17 Distributed software engineering

2. Service-oriented architecture is an approach to structuring a software

system as a set of separate, stateless services. These services may be

provided by multiple

providers and may be distributed. Typically, transactions are short

transactions

where a service is called, does something, and then returns a result.

SaaS is a way of delivering application functionality to users, whereas SOA

is an

implementation technology for application systems. Systems that are

implemented

using SOA do not have to be accessed by users as web services. SaaS

applications

for business may be implemented using components rather than services.

However,

if SaaS is implemented using SOA, it becomes possible for applications to

use

service APIs to access the functionality of other applications. They can

then be

integrated into more complex systems. These systems are called mashups

and are

another approach to software reuse and rapid software development.

From a software development perspective, the process of service

development

has much in common with other types of software development. However,

service

construction is not usually driven by user requirements, but by the service

provider’s assumptions about what users need. Accordingly, the software

needs to be able to

evolve quickly after the provider gets feedback from users on their

requirements.

Agile development with incremental delivery is therefore an effective

approach for

software that is to be deployed as a service.

Some software that is implemented as a service, such as Google Docs for

web users,

offers a generic experience to all users. However, businesses may wish to

have specific services that are tailored to their own requirements. If you

are implementing SaaS for business, you may base your software service

on a generic service that is tailored to the needs of each business

customer. Three important factors have to be considered:

1. Configurability How do you configure the software for the specific

requirements of each organization?

2. Multi-tenancy How do you present each user of the software with the

impression that they are working with their own copy of the system while,

at the same

time, making efficient use of system resources?

3. Scalability How do you design the system so that it can be scaled to

accommodate an unpredictably large number of users?

The notion of product-line architectures, discussed in Chapter 16, is one

way of configuring software for users who have overlapping but not

identical requirements. You start with a generic system and adapt it

according to the specific requirements of each user.

This does not work for SaaS, however, for it would mean deploying a

different copy

of the service for each organization that uses the software. Rather, you

need to design configurability into the system and provide a configuration

interface that allows users to specify their preferences. You then use these

preferences to adjust the behavior of the software dynamically as it is

used. Configuration facilities may allow for:

1. Branding, where users from each organization are presented with an

interface that reflects their own organization.

17.4 Software as a service 515

User 1

User 12

User 3

User 4

User 5

Profile C1

Profile C2

Profile C3

Figure 17.16

Configuration of a

Application service

software system

offered as a service

2. Business rules and workflows, where each organization defines its own

rules that govern the use of the service and its data.

3. Database extensions, where each organization defines how the generic

service data model is extended to meet its specific needs.

4. Access control, where service customers create individual accounts for

their staff and define the resources and functions that are accessible to

each of their users.

Figure 17.16 illustrates this situation. This diagram shows five users of the

appli-

cation service, who work for three different customers of the service

provider. Users interact with the service through a customer profile that

defines the service configuration for their employer.

Multi-tenancy is a situation in which many different users access the same

system

and the system architecture is defined to allow the efficient sharing of

system resources.

However, it must appear to users that they each have sole use of the

system. Multi-

tenancy involves designing the system so that there is an absolute

separation between system functionality and system data. All operations

must therefore be stateless so that they can be shared. Data must either be

provided by the client or should be available in a storage system or

database that can be accessed from any system instance.

A particular problem in multi-tenant systems is data management. The

simplest

way to provide data management is for all customers to have their own

database,

which they may use and configure as they wish. However, this requires

the service

provider to maintain many different database instances (one per customer)

and to

make these databases available on demand.

As an alternative, the service provider can use a single database, with

different

users being virtually isolated within that database. This is illustrated in

Figure 17.17, where you can see that database entries also have a “tenant

identifier” that links

these entries to specific users. By using database views, you can extract

the entries for each service customer and so present users from that

customer with a virtual,

personal database. This process can be extended to meet specific customer

needs

using the configuration features discussed above.

Scalability is the ability of the system to cope with increasing numbers of

users

without reducing the overall quality of service that is delivered to any

user. Generally,

516 Chapter 17 Distributed software engineering

Tenant Key Name

Address

234

C100 XYZ Corp 43, Anystreet, Sometown

234

C110 BigCorp

2, Main St, Motown

435

X234 J. Bowie

56, Mill St, Starville

Figure 17.17 A

592

PP37 R. Burns

Alloway, Ayrshire

multi-tenant database

when considering scalability in the context of SaaS, you are considering

“scaling

out” rather than “scaling up.” Recall that scaling out means adding

additional servers and so also increasing the number of transactions that

can be processed in parallel.

Scalability is a complex topic that I cannot cover in detail here, but

following are some general guidelines for implementing scalable software:

1. Develop applications where each component is implemented as a simple

state-

less service that may be run on any server. In the course of a single

transaction,

a user may therefore interact with instances of the same service that are

running

on several different servers.

2. Design the system using asynchronous interaction so that the

application does

not have to wait for the result of an interaction (such as a read request).

This

allows the application to carry on doing useful work while it is waiting for

the

interaction to finish.

3. Manage resources, such as network and database connections, as a pool

so that

no single server is likely to run out of resources.

4. Design your database to allow fine-grain locking. That is, do not lock

out whole records in the database when only part of a record is in use.

5. Use a cloud PaaS platform, such as Google App Engine (Sanderson

2012) or

other PaaS platform for system implementation. These include

mechanisms that

will automatically scale out your system as the load increases.

The notion of software as a service is a major paradigm shift for

distributed com-

puting. We have already seen consumer software and professional

applications, such

as Photoshop, move to this model of delivery. Increasingly, businesses are

replacing their own systems, such as CRM and inventory systems, with

cloud-based SaaS systems from external providers such as Salesforce.

Specialized software companies

that implement business applications prefer to provide SaaS because it

simplifies

software update and management.

SaaS represents a new way to think about the engineering of enterprise

systems.

It has always been helpful to think of systems delivering services to users,

but, until SaaS, this function has involved using different abstractions,

such as objects, when implementing the system. Where there is a closer

match between user and system

abstractions, the resultant systems are easier to understand, maintain, and

evolve.

17.4

Chapter

17

Further Reading

Further reading 517

K e y P o i n t s

The benefits of distributed systems are that they can be scaled to cope

with increasing demand, can continue to provide user services (even if

some parts of the system fail), and they enable resources to be shared.

Issues to be considered in the design of distributed systems include

transparency, openness, scalability, security, quality of service, and failure

management.

Client–server systems are distributed systems in which the system is

structured into layers, with the presentation layer implemented on a client

computer. Servers provide data management, application, and database

services.

Client–server systems may have several tiers, with different layers of the

system distributed to different computers.

Architectural patterns for distributed systems include master–slave

architectures, two-tier and multitier client–server architectures,

distributed component architectures, and peer-to-peer architectures.

Distributed component systems require middleware to handle

component communications and to allow objects to be added to and

removed from the system.

Peer-to-peer architectures are decentralized architectures in which there

are no distinguished clients and servers. Computations can be distributed

over many systems in different organizations.

Software as a service is a way of deploying applications as thin client–

server systems, where the client is a web browser.

F u r t h e r r e A D i n g

Peer-to-Peer: Harnessing the Power of Disruptive Technologies. Although this

book does not have a lot of information on p2p architectures, it is an

excellent introduction to p2p computing and discusses the organization

and approach used in a number of p2p systems. (A. Oram (ed.), O’Reilly

and Associates Inc., 2001).

“Turning Software into a Service.” A good overview paper that discusses

the principles of service-oriented computing. Unlike many papers on this

topic, it does not conceal these principles behind a discussion of the

standards involved. (M. Turner, D. Budgen, and P. Brereton, IEEE

Computer, 36 (10),

October 2003) http://dx.doi.org/10.1109/MC.2003.1236470

Distributed Systems, 5th ed. A comprehensive textbook that discusses all

aspects of distributed systems design and implementation. It includes

coverage of peer-to-peer systems and mobile systems.

(G. Coulouris, J. Dollimore, T. Kindberg, and G. Blair. Addison-Wesley,

2011).

Engineering Software as a Service: An Agile Approach Using Cloud Computing.

This book accompanies the authors’ online course on the topic. A good

practical book that is aimed at people new to this type

of development. (A. Fox and D. Patterson, Strawberry Canyon LLC, 2014)

http://www. saasbook.info

518

518 Chapter 17 Distributed

Distributed software engineering

W e b S i t e

PowerPoint slides for this chapter:

www.pearsonglobaleditions.com/Sommerville

Links to supporting videos:

http://software-engineering-book.com/videos/requirements-and-design/

e x e r C i S e S

17.1. What do you understand by “scalability”? Discuss the differences

between scaling up and scaling out and explain when these different

approaches to scalability may be used.

17.2. Explain why distributed software systems are more complex than

centralized software systems, where all of the system functionality is

implemented on a single computer.

17.3. Using an example of a remote procedure call, explain how

middleware coordinates the interaction of computers in a distributed

system.

17.4. What are the different logical layers in an application with a

distributed client–server architecture?

17.5. You have been asked to design a secure system that requires strong

authentication and authorization. The system must be designed so that

communications between parts of the system cannot be intercepted and

read by an attacker. Suggest the most appropriate client–

server architecture for this system and, giving the reasons for your answer,

propose how functionality should be distributed between the client and

the server systems.

17.6. Your customer wants to develop a system for stock information

where dealers can access information about companies and evaluate

various investment scenarios using a simulation system.

Each dealer uses this simulation in a different way, according to his or her

experience and the type of stocks in question. Suggest a client–server

architecture for this system that shows where functionality is located.

Justify the client–server system model that you have chosen.

17.7. Using a distributed component approach, propose an architecture

for a national theater booking system. Users can check seat availability

and book seats at a group of theaters. The system should support ticket

returns so that people may return their tickets for last-minute resale to

other customers.

17.8. What is the fundamental problem with a two-tier client–server

approach? Define how a multitier client–server approach overcomes this.

17.9. List the benefits that a distributed component model has when used

for implementing distributed systems.

17.10. Your company wishes to move from using desktop applications to

accessing the same functionality remotely as services. Identify three risks

that might arise and suggest how these risks may be reduced.

17.4

Chapter 17 References 519

r e F e r e n C e S

Bernstein, P. A. 1996. “Middleware: A Model for Distributed System

Services.” Comm. ACM 39 (2): 86–97. doi:10.1145/230798.230809.

Coulouris, G., J. Dollimore, T. Kindberg, and G. Blair. 2011. Distributed

Systems: Concepts and Design, 5th ed. Harlow, UK: Addison-Wesley.

Holdener, A. T. (2008). Ajax: The Definitive Guide. Sebastopol, CA.: O’Reilly

& Associates.

McDougall, P. 2000. “The Power of Peer-to-Peer.” Information Week

(August 28, 2000). http://www

.informationweek.com/801/peer.htm

Oram, A. 2001. “Peer-to-Peer: Harnessing the Benefits of a Disruptive

Technology.” Sebastopol, CA: O’Reilly & Associates.

Orfali, R., D. Harkey, and J. Edwards. 1997. Instant CORBA. Chichester,

UK: John Wiley & Sons.

Pope, A. 1997. The CORBA Reference Guide: Understanding the Common

Object Request Broker Architecture. Harlow, UK: Addison-Wesley.

Sanderson, D. 2012. Programming with Google App Engine. Sebastopol, CA:

O’Reilly Media Inc.

Sarris, S. 2013. HTML5 Unleashed. Indianapolis, IN: Sams Publishing.

Tanenbaum, A. S., and M. Van Steen. 2007. Distributed Systems: Principles

and Paradigms, 2nd ed.

Upper Saddle River, NJ: Prentice-Hall.

Wallach, D. S. 2003. “A Survey of Peer-to-Peer Security Issues.” In

Software Security: Theories and Systems, edited by M. Okada, B. C. Pierce,

A. Scedrov, H. Tokuda, and A. Yonezawa, 42–57.

Heidelberg: Springer-Verlag. doi:10.1007/3-540-36532-X_4.

18

Service-oriented

software engineering

Objectives

The objective of this chapter is to introduce service-oriented software

engineering as a way of building distributed applications using web

services. When you have read this chapter, you will:

understand the basic notions of a web service, web service

standards, and service-oriented architecture;

understand the idea of RESTful services and the important

differences between RESTful and SOAP-based services;

understand the service engineering process that is intended to

produce reusable web services;

understand how workflow-based service composition can be used

to create service-oriented software that supports business processes.

Contents

18.1 Service-oriented architecture

18.2 RESTful services

18.3 Service engineering

18.4 Service composition

Chapter 18 Service-oriented software engineering 521

The development of the Web in the 1990s revolutionized organizational

information

exchange. Client computers could gain access to information on remote

servers out-

side their own organizations. However, access was solely through a web

browser,

and direct access to the information by other programs was not practical.

This meant that opportunistic connections between servers, where, for

example, a program could

query a number of catalogs from different suppliers, were not possible.

To get around this problem, web services were developed that allowed

programs

to access and update resources available on the web. Using a web service,

organiza-

tions that wish to make their information accessible to other programs can

do so by

defining and publishing a programmatic web service interface. This

interface defines the data available and how it can be accessed and used.

More generally, a web service is a standard representation for some

computational or information resource that can be used by other

programs. These may be information

resources, such as a parts catalog, computer resources, such as a

specialized processor, or storage resources. For example, an archive

service could be implemented that permanently and reliably stores

organizational data that, by law, has to be maintained for many years.

A web service is an instance of a more general notion of a service, which

Lovelock

et al. (Lovelock et al., 1996) defined as:

an act or performance offered by one party to another. Although the process

may be tied to a physical product, the performance is essentially intangible

and does not normally result in ownership of any of the factors of production.

Services are a natural development of software components where the

component

model is, in essence, a set of standards associated with web services. A

web service can therefore be defined as:

A loosely coupled, reusable software component that encapsulates discrete func-

tionality, which may be distributed and programmatically accessed. A web

service is a service that is accessed using standard Internet and XML-based

protocols.

A critical distinction between a service and a software component, as

defined in

component-based software engineering, is that services should be

independent and

loosely coupled. That is, they should always operate in the same way,

irrespective of their execution environment. They should not rely on

external components that may

have different functional and non-functional behavior. Therefore, web

services do

not have a “requires” interface that, in CBSE, defines the other system

components

that must be present. A web service interface is simply a “provides”

interface that

defines the service functionality and parameters.

Service-oriented systems are a way of developing distributed systems

where the

system components are stand-alone services, executing on geographically

distributed

computers. Services are platform and implementation-language

independent. Software

systems can be constructed by composing local services and external

services from

different providers, with seamless interaction between the services in the

system.

†Lovelock, C., Vandermerwe, S. and Lewis, B. (1996). Services Marketing.

Englewood Cliffs, NJ: Prentice Hall.

522 Chapter 18 Service-oriented software engineering

As I discussed in Chapter 17, the ideas of “software as a service” and

“service-

oriented systems” are not the same thing. Software as a service means

offering

software functionality to users remotely over the web, rather than through

applica-

tions installed on a user’s computer. Service-oriented systems are systems

that are

implemented using reusable service components and that are accessed by

other pro-

grams, rather than directly by users. Software that is offered as a service

may be

implemented using a service-oriented system. However, you don’t have to

imple-

ment software in this way to offer it as a user service.

Adopting a service-oriented approach to software engineering has a

number of

important benefits:

1. Services can be offered by any service provider inside or outside of an

organization.

Assuming these services conform to certain standards, organizations can

create

applications by integrating services from a range of providers. For

example, a

manufacturing company can link directly to services provided by its

suppliers.

2. The service provider makes information about the service public so that

any

authorized user can use the service. The service provider and the service

user do

not need to negotiate about what the service does before it can be

incorporated

in an application program.

3. Applications can delay the binding of services until they are deployed

or until

execution. Therefore, an application using a stock price service (say)

could, in

principle, dynamically change service providers while the system was

execut-

ing. This means that applications can be reactive and adapt their

operation to

cope with changes to their execution environment.

4. Opportunistic construction of new services is possible. A service

provider may

recognize new services that can be created by linking existing services in

innovative ways.

5. Service users can pay for services according to their use rather than

their provision.

Therefore, instead of buying an expensive component that is rarely used,

the appli-

cation writer can use an external service that will be paid for only when

required.

6. Applications can be made smaller, which is particularly important for

mobile

devices with limited processing and memory capabilities. Some

computationally

intensive processing and exception handling can be offloaded to external

services.

Service-oriented systems have loosely coupled architectures where service

bindings

may change during system execution. A different, but equivalent, version

of the ser-

vice may therefore be executed at different times. Some systems will be

solely built using web services, and others will mix web services with

locally developed components. To illustrate how applications that use a

mixture of services and components

may be organized, consider the following scenario:

An in-car information system provides drivers with information on weather,

road traffic conditions, local information and so forth. This is linked to the car

Chapter 18 Service-oriented software engineering 523

Road traffic info

Weather

Facilities

Road

Traffic

info

info

locator

info

gps

gps coord

coord

gps coord

Mobile Info Service

Service discovery

Translator

Collates information

Finds available

services

Language

Info

command

info

stream

gps coord

Receiver

Transmitter

User interface

Receives

Sends position and

Receives request

information stream

information request

from user

from services

to services

Locator

Radio

Translates digital

Discovers car

info stream to

position

Figure 18.1 A service-

radio signal

based, in-car

In-car software system

information system

radio so that information is delivered as a signal on a specific radio channel.

The car is equipped with GPS receiver to discover its position, and, based on

that position, the system accesses a range of information services. Information

may then be delivered in the driver’s specified language.

Figure 18.1 illustrates a possible organization for such a system. The in-car

soft-

ware includes five modules. These handle communications with the

driver, with a

GPS receiver that reports the car’s position, and with the car radio. The

Transmitter and Receiver modules handle all communications with

external services.

The car communicates with an external mobile information service that

aggre-

gates information from a range of other services, providing information on

weather,

traffic, and local facilities. Different providers in different places offer

these services, and the in-car system accesses an external discovery service

to find the services

available in the local area. The mobile information service also uses the

discovery

service to bind to the appropriate weather, traffic, and facilities services.

The aggregated information is then sent to the car through a service that

translates that information into the driver’s preferred language.

This example illustrates one of the key advantages of the service-oriented

approach.

When the system is programmed or deployed, you don’t have to decide

what service

524 Chapter 18 Service-oriented software engineering

provider should be used or what specific services should be accessed. As

the car moves around, the in-car software uses the service discovery

service to find the most useful local information service. Because of the

use of a translation service, it can move across bor-ders and make local

information available to people who don’t speak the local language.

I think that the service-oriented approach to software engineering is as

important

a development as object-oriented software engineering. Service-oriented

systems are

essential to the cloud and mobile systems. Newcomer and Lomow

(Newcomer and

Lomow 2005), in their book on SOA, summarize the potential of service-

oriented

approaches, which is now being realized:

Driven by the convergence of key technologies and the universal adoption of

Web services, the service-oriented enterprise promises to significantly improve

corporate agility, speed time-to-market for new products and services, reduce

IT costs and improve operational efficiency.

Building applications based on services allows companies and other

organiza-

tions to cooperate and make use of each other’s business functions. Thus,

systems

that involve extensive information exchange across company boundaries,

such as

supply chain systems where one company orders goods from another, can

easily be

automated. Service-based applications may be constructed by linking

services from

various providers using either a standard programming language or a

specialized

workflow language, as discussed in Section 18.4.

Initial work on service provision and implementation was heavily

influenced by

the failure of the software industry to agree on component standards. It

was therefore standards-driven, with all of the main industrial companies

involved in standards

development. This led to a whole range of standards (WS* standards) and

the notion

of service-oriented architectures. These were proposed as architectures for

service-

based systems, with all service communication being standards-based.

However, the

standards proposed were complex and had a significant execution

overhead. This

problem has led many companies to adopt an alternative architectural

approach,

based on so-called RESTful services. A RESTful approach is a simpler

approach

than a service-oriented architecture, but it is less suited to services that

offer complex functionality. I discuss both of these architectural

approaches in this chapter.

18.1 Service-oriented architecture

Service-oriented architecture (SOA) is an architectural style based on the

idea that executable services can be included in applications. Services

have well-defined, published interfaces, and applications can choose

whether or not these are appropriate. An important idea underlying SOA

is that the same service may be available from different providers and that

applications could make a runtime decision of which service provider to

use.

†Newcomer, E. and Lomow, G. (2005). Understanding SOA with Web

Services. Boston: Addison-Wesley.

18.1 Service-oriented architecture 525

Service

registry

Find

Publish

Service

Service

requestor

Service

Figure 18.2 Service-

provider

Bind (SOAP)

oriented architecture

(WSDL)

XML technologies (XML, XSD, XSLT, ....)

Support (WS-Security, WS-Addressing, ...)

Process (WS-BPEL)

Service definition (UDDI, WSDL)

Messaging (SOAP)

Figure 18.3 Web

Transport (HTTP, HTTPS, SMTP, ...)

service standards

Figure 18.2 illustrates the structure of a service-oriented architecture.

Service providers design and implement services and specify the interface

to these services. They also publish information about these services in an

accessible registry. Service requestors (sometimes called service clients)

who wish to make use of a service discover the specification of that

service and locate the service provider. They can then bind their

application to that specific service and communicate with it, using

standard service protocols.

The development and use of internationally agreed standards is

fundamental to

SOA. As a result, service-oriented architectures have not suffered from the

incompatibilities that normally arise with technical innovations, where

different suppliers

maintain their proprietary version of the technology. Figure 18.3 shows

the stack of key standards that have been established to support web

services.

Web service protocols cover all aspects of service-oriented architectures,

from the

basic mechanisms for service information exchange (SOAP) to

programming language

standards (WS-BPEL). These standards are all based on XML, a human and

machine-

readable notation that allows the definition of structured data where text

is tagged with a meaningful identifier. XML has a range of supporting

technologies, such as XSD for schema definition, which are used to extend

and manipulate XML descriptions. Erl (Erl 2004) provides a good

summary of XML technologies and their role in web services.

Briefly, the fundamental standards for service-oriented architectures are:

1. SOAP This is a message interchange standard that supports

communication between services. It defines the essential and optional

components of messages

526 Chapter 18 Service-oriented software engineering

passed between services. Services in a service-oriented architecture are

some-

times called SOAP-based services.

2. WSDL The Web Service Description Language (WSDL) is a standard for

ser-

vice interface definition. It sets out how the service operations (operation

names,

parameters, and their types) and service bindings should be defined.

3. WS-BPEL This is a standard for a workflow language that is used to

define process programs involving several different services. I explain

what process pro-

grams are in Section 18.3.

The UDDI (Universal Description, Discovery, and Integration) discovery

standard

defines the components of a service specification intended to help

potential users

discover the existence of a service. This standard was meant to allow

companies to set up registries, with UDDI descriptions defining the

services they offered. Some companies set up UDDI registries in the early

years of the 21st century, but users preferred standard search engines to

find services. All public UDDI registries have now closed.

The principal SOA standards are supported by a range of supporting

standards

that focus on more specialized aspects of SOA. There are many supporting

standards

because they are intended to support SOA in different types of enterprise

applica-

tion. Some examples of these standards include:

1. WS-Reliable Messaging, a standard for message exchange that ensures

mes-

sages will be delivered once and once only.

2. WS-Security, a set of standards supporting web service security,

including standards that specify the definition of security policies and

standards that cover the

use of digital signatures.

3. WS-Addressing, which defines how address information should be

represented in a SOAP message.

4. WS-Transactions, which defines how transactions across distributed

services should be coordinated.

Web service standards are a huge topic, and I don’t have space to discuss

them in

detail here. I recommend Erl’s books (Erl 2004, 2005) for an overview of

these

standards. Their detailed descriptions are also available as public

documents on the Web (W3C 2013).

18.1.1 Service components in an SOA

Message exchange, as I explained in Section 17.1, is an important

mechanism for

coordinating actions in a distributed computing system. Services in a SOA

commu-

nicate by exchanging messages, expressed in XML, and these messages are

distrib-

uted using standard Internet transport protocols such as HTTP and TCP/

IP.

A service defines what it needs from another service by setting out its

require-

ments in a message, which is sent to that service. The receiving service

parses the

18.1 Service-oriented architecture 527

WSDL service definition

Intro

XML namespace declarations

Type declarations

Abstract interface

Interface declarations

Message declarations

Concrete

Figure 18.4

Binding declarations

implementation

Organization of a WSDL

Endpoint declarations

specification

message, carries out the computation, and, upon completion, sends a

reply, as a mes-

sage, to the requesting service. This service then parses the reply to extract

the

required information. Unlike software components, services do not use

remote

procedure or method calls to access functionality associated with other

services.

When you intend to use a web service, you need to know where the

service is located

(its Uniform Resource Identifier—URI) and the details of its interface.

These details are provided in a service description that is written in an

XML-based language called WSDL

(Web Service Description Language). The WSDL specification defines three

aspects of

a Web service: what the service does, how it communicates, and where to

find it:

1. The “what” part of a WSDL document, called an interface, specifies

what oper-

ations the service supports and defines the format of the messages sent

and

received by the service.

2. The “how” part of a WSDL document, called a binding, maps the

abstract inter-

face to a concrete set of protocols. The binding specifies the technical

details of

how to communicate with a Web service.

3. The “where” part of a WSDL document describes the location of a

specific Web

service implementation (its endpoint).

The WSDL conceptual model (Figure 18.4) shows the elements of a service

description. Each element is expressed in XML and may be provided in

separate

files. These elements are:

1. An introductory part that usually defines the XML namespaces used and

that

may include a documentation section providing additional information

about

the service.

2. An optional description of the types used in the messages exchanged by

the service.

3. A description of the service interface, that is, the operations that the

service provides for other services or users.

4. A description of the input and output messages processed by the

service.

5. A description of the binding used by the service, that is, the messaging

protocol that will be used to send and receive messages. The default is

SOAP, but other

528 Chapter 18 Service-oriented software engineering

Define some of the types used. Assume that the namespace prefixes ’ws’

refers to

the namespace URI for XML schemas and the namespace prefix

associated with this

definition is weathns.

<types>

<xs: schema targetNameSpace = “http://. . ./weathns”

xmlns: weathns = “http://. . ./weathns” >

<xs:element name = “PlaceAndDate” type = “pdrec” />

<xs:element name = “MaxMinTemp” type = “mmtrec” />

<xs:element name = “InDataFault” type = “errmess” />

<xs:complexType name = “pdrec”

<xs:sequence>

<xs:element name = “town” type = “xs:string”/>

<xs:element name = “country” type = “xs:string”/>

<xs:element name = “day” type = “xs:date” />

</xs:complexType>

Definitions of MaxMinType and InDataFault here

</schema>

</types>

Now define the interface and its operations. In this case, there is only a

single

operation to return maximum and minimum temperatures

<interface name = “weatherInfo” >

<operation name = “getMaxMinTemps” pattern = “wsdlns: in-

out”>

<input messageLabel = “In” element = “weathns: PlaceAndDate” /

>

<output messageLabel = “Out” element = “weathns:MaxMinTemp”

/>

<outfault messageLabel = “Out” element = “weathns:InDataFault”

/>

</operation>

</interface>

Figure 18.5 Part of a

bindings may also be specified. The binding sets out how the input and

output

WSDL description for a

messages associated with the service should be packaged into a message,

and

web service

specifies the communication protocols used. The binding may also specify

how

supporting information, such as security credentials or transaction

identifiers, is

included in messages to the service.

6. An endpoint specification that is the physical location of the service,

expressed as a URI—the address of a resource that can be accessed over

the Internet.

Figure 18.5 shows part of the interface for a simple service that, given a

date and a place, specified as a town within a country, returns the

maximum and minimum temperature recorded in that place on that date.

The input message also specifies whether these temperatures are to be

returned in degrees Celsius or degrees Fahrenheit.

XML-based service descriptions include definitions of XML namespaces. A

names-

pace identifier may precede any identifier used in the XML description,

making it possible to distinguish between identifiers with the same name

that have been defined in different parts of an XML description. You don’t

have to understand the details of namespaces to

18.2 RESTful services 529

understand the examples here. You only need to know that names may be

prefixed with a namespace identifier and that the namespace:name pair

should be unique.

In Figure 18.5, the first part of the description shows part of the element

and type definition that is used in the service specification. This defines

the elements

PlaceAndDate, MaxMinTemp, and InDataFault. I have only included the

specification

of PlaceAndDate, which you can think of as a record with three fields—

town, country

and date. A similar approach would be used to define MaxMinTemp and

InDataFault.

The second part of the description shows how the service interface is

defined. In this example, the service weatherInfo has a single operation,

although there are no restrictions on the number of operations that may

be defined. The weatherInfo operation has an associated in-out pattern

meaning that it takes one input message and generates one output

message. The WSDL 2.0 specification allows for a number of message

exchange patterns such as in-only, in-out, out-only, in-optional-out, and

out-in. The input and output messages, which refer to the definitions made

earlier in the types section, are then defined.

A service interface that is defined in WSDL is simply a description of the

service

signature, that is, the operations and their parameters. It does not include

any information about the semantics of the service or its non-functional

characteristics, such as performance and dependability. If you plan to use

the service, you have to work

out what the service actually does and the meaning of the input and

output messages.

You have to experiment to discover the service’s performance and

dependability.

While meaningful names and documentation help with understanding the

service

functionality, it is still possible to misunderstand what the service actually

does.

XML-based service descriptions are long, detailed, and tedious to read.

WSDL

specifications are not normally written by hand, and most of the

information in a

specification is automatically generated.

18.2 RESTful services

The initial developments of web services and service-oriented software

engineering

were standards-based, with XML-based messages exchanged between

services. This

is a general approach that allows for the development of complex services,

dynamic

service binding, and control over quality of service and service

dependability.

However, as services were developed, it emerged that most of these were

single-

function services with relatively simple input and output interfaces.

Service users

were not really interested in dynamic binding and the use of multiple

service providers. They rarely use web service standards for quality of

service, reliability, and so forth.

The problem is that web services standards are “heavyweight” standards

that are

sometimes overly general and inefficient. Implementing these standards

requires a considerable amount of processing to create, transmit, and

interpret the associated XML

messages. This slows down communications between services, and, for

high-throughput

systems, additional hardware may be required to deliver the quality of

service required.

In response to this situation, an alternative “lightweight” approach to web

service

architecture has been developed. This approach is based on the REST

architectural

530 Chapter 18 Service-oriented software engineering

CREATE

POST

URL

DELETE

Web-accessible

Resource R

READ

DELETE

GET

resource R

UPDATE

PUT

Figure 18.6 Resources

and actions

(a) General resource actions

(b) Web resources

style, where REST stands for Representational State Transfer (Fielding

2000). REST

is an architectural style based on transferring representations of resources

from a

server to a client. It is the style that underlies the web as a whole and has

been used as a much simpler method than SOAP/WSDL for implementing

web service interfaces.

The fundamental element in a RESTful architecture is a resource.

Essentially, a

resource is simply a data element such as a catalog and a medical record,

or a docu-

ment, such as this book chapter. In general, resources may have multiple

representa-

tions; that is, they can exist in different formats. For example, this book

chapter has three representations. These are a MS Word representation,

which is used for editing, a PDF representation, which is used for web

display, and a InDesign representation,

which is used for publishing. The underlying logical resource made up of

text and

images is the same in all of these representations.

In a RESTful architecture, everything is represented as a resource.

Resources have

a unique identifier, which is their URL. Resources are a bit like objects,

with four fundamental polymorphic operations associated with them, as

shown in Figure 18.6(a):

1. Create—bring the resource into existence.

2. Read—return a representation of the resource.

3. Update—change the value of the resource.

4. Delete—make the resource inaccessible.

The Web is an example of a system that has a RESTful architecture. Web

pages

are resources, and the unique identifier of a web page is its URL.

The web protocols http and https are based on four actions, namely, POST,

GET,

PUT, and DELETE. These map onto the basic resource operations, as I

have shown

in Figure 18.6(b):

1. POST is used to create a resource. It has associated data that defines the

resource.

2. GET is used to read the value of a resource and return that to the

requestor in the specified representation, such as XHTML, that can be

rendered in a web browser.

18.2 RESTful services 531

3. PUT is used to update the value of a resource.

4. DELETE is used to delete the resource.

All services, in some way, operate on data. For example, the service

described in

Section 18.2 that returns the maximum and minimum temperatures for a

location on

a given data uses a weather information database. SOAP-based services

execute

actions on this database to return particular values from it. RESTful

services

(Richardson and Ruby 2007) access the data directly.

When a RESTful approach is used, the data is exposed and is accessed

using its

URL. RESTful services use http or https protocols, with the only allowed

actions

being POST, GET, PUT, and DELETE. Therefore, the weather data for each

place in

the database might be accessed using URLs such as:

http://weather-info-example.net/temperatures/boston

http://weather-info-example.net/temperatures/edinburgh

This would invoke the GET operation and return a list of maximum and

minimum

temperatures. To request the temperatures for a specific date, a URL query

can be used:

http://weather-info-example.net/temperatures/edinburgh?

date=20140226

URL queries can also be used to disambiguate the request, given that there

may

be several places in the world with the same name:

http://weather-info-example.net/temperatures/boston?

date=20140226&country=

USA&state=“Mass”

An important difference between RESTful services and SOAP-based

services is

that RESTful services are not exclusively XML-based. So, when a resource

is

requested, created, or changed, the representation may be specified. This

is impor-

tant for RESTful services because representations such as JSON (Javascript

Object

Notation), as well as XML, may be used. These can be processed more

efficiently

than XML-based notations, thus reducing the overhead involved in a

service call.

Therefore, the above request for maximum and minimum temperatures for

Boston

may return the following information:

{

“place”: “Boston”,

“country “USA”,

“state”: “Mass”,

“date”: “26 Feb 2014”,

“units”: “Fahrenheit”,

“max temp”: 41,

“min temp”: 29

}

The response to a GET request in a RESTful service may include URLs.

Therefore, if the response to a request is a set of resources, then the URL

of each of

532 Chapter 18 Service-oriented software engineering

Service

Restful API

requestor 1

Resource

R

SOAP-based

Service

Figure 18.7 RESTful

API

requestor 2

and SOAP-based APIs

these services may be included. The requesting service may then process

the

requests in its own way. Therefore, a request for weather information

given a place

name that is not unique may return the URLs of all of the places that

match the request.

For example:

http://weather-info-example.net/temperatures/edinburgh-scotland

http://weather-info-example.net/temperatures/edinburgh-australia

http://weather-info-example.net/temperatures/edinburgh-maryland

A fundamental design principle for RESTful services is that they should be

state-

less. That is, in an interaction session, the resource itself should not

include any state information, such as the time of the last request. Instead,

all necessary state information should be returned to the requestor. If state

information is required in later

requests, it should be returned to the server by the requestor.

RESTful services have become more widely used over the past few years

because

of the widespread use of mobile devices. These devices have limited

processing

capabilities, so the lower overhead of RESTful services allows better

system perfor-

mance. They are also easy to use with existing websites—implementing a

RESTful

API for a website is usually fairly straightforward.

However, there are problems with the RESTful approach:

1. When a service has a complex interface and is not a simple resource, it

can be

difficult to design a set of RESTful services to represent this interface.

2. There are no standards for RESTful interface description, so service

users must

rely on informal documentation to understand the interface.

3. When you use RESTful services, you have to implement your own

infrastruc-

ture for monitoring and managing the quality of service and the service

reliabil-

ity. SOAP-based services have additional infrastructure support standards

such

as WS-Reliability and WS-Transactions.

Pautasso et al. (Pautasso, Zimmermann, and Leymann 2008) discuss when

RESTful and SOAP-based should be used. However, it is often possible to

provide

both SOAP-based and RESTful interfaces to the same service or resource

(Figure

18.7). This dual approach is now common for cloud services from

providers such as

Microsoft, Google, and Amazon. Service clients can then choose the

service access

method that is best suited to their applications.

18.3 Service engineering 533

Service

Service

candidate

Service design

implementation

identification

and deployment

Service

Service interface

Validated and

requirements

specification

deployed service

Figure 18.8 The service

engineering process

18.3 Service engineering

Service engineering is the process of developing services for reuse in

service-oriented applications. It has much in common with component

engineering. Service engineers have to ensure that the service represents a

reusable abstraction that could

be useful in different systems. They must design and develop generally

useful

functionality associated with that abstraction and ensure that the service

is robust and reliable. They have to document the service so that it can be

discovered and

understood by potential users.

As shown in Figure 18.8, there are three logical stages in the service

engineering

process:

1. Service candidate identification, where you identify possible services that

might be implemented and define the service requirements.

2. Service design, where you design the logical service interface and its

implementation interfaces (SOAP-based and/or RESTful).

3. Service implementation and deployment, where you implement and test

the service and make it available for use.

As I discussed in Chapter 16, the development of a reusable component

may

start with an existing component that has already been implemented and

used in

an application. The same is true for services—the starting point for this

process

will often be an existing service or a component that is to be converted to

a ser-

vice. In this situation, the design process involves generalizing the existing

com-

ponent so that application-specific features are removed. Implementation

means

adapting the component by adding service interfaces and implementing

the

required generalizations.

18.3.1 Service candidate identification

The basic idea of service-oriented computing is that services should

support business processes. As every organization has a wide range of

processes, many possible

services may be implemented. Service candidate identification therefore

involves

534 Chapter 18 Service-oriented software engineering

Utility

Business

Coordination

Task

Currency converter

Validate claim form

Process expense claim

Employee locator

Check credit rating

Pay external supplier

Entity

Document translator

Expenses form

Web form to XML converter

Student application form

Figure 18.9 Service

understanding and analyzing the organization’s business processes to

decide which

classification

reusable services could be implemented to support these processes.

Erl (Erl 2005) suggests that there are three fundamental types of service:

1. Utility services. These services implement some general functionality that

may be used by different business processes. An example of a utility

service is a currency conversion service that can be accessed to compute

the conversion of one

currency (e.g., dollars) to another (e.g., euros).

2. Business services. These services are associated with a specific business

function. An example of a business function in a university would be the

registration

of students for a course.

3. Coordination or process services. These services support a more general

business process that usually involves different actors and activities. An

example of

a coordination service in a company is an ordering service that allows

orders to

be placed with suppliers, goods accepted, and payments made.

Erl also suggests that services can be thought of as task-oriented or entity-

oriented. Task-oriented services are associated with some activity, whereas

entity-

oriented services are associated with a system resource. The resource is a

business

entity such as a job application form. Figure 18.9 shows examples of

services that

are task-oriented or entity-oriented. Utility or business services may be

entity-

oriented or task-oriented. Coordination services are always task-oriented.

Your goal in service candidate identification should be to identify services

that

are logically coherent, independent, and reusable. Erl’s classification is

helpful in this respect, as it suggests how to discover reusable services by

looking at business entities as resources and business activities. However,

identifying service candidates is sometimes difficult because you have to

envisage how the services could be used.

You have to think of possible candidates and then ask a series of questions

about

them to see if they are likely to be useful services. Possible questions that

you might ask to identify potentially reusable services are:

1. For an entity-oriented service, is the service associated with a single

logical

resource that is used in different business processes? What operations are

nor-

mally performed on that entity that must be supported? Do these fit with

the

RESTful service operations PUT, CREATE, POST, and DELETE?

2. For a task-oriented service, is the task one that is carried out by

different people in the organization? Will they be willing to accept the

inevitable standardization

18.3 Service engineering 535

that occurs when a single support service is provided? Can this fit into the

RESTful model, or should it be redesigned as an entity-oriented service.

3. Is the service independent? That is, to what extent does it rely on the

availability of other services?

4. Does the service have to maintain state? If state information is required,

this

must either be maintained in a database or passed as a parameter to the

service.

Using a database affects service reusability as there is a dependency

between the

service and the required database. In general, services where the state is

passed

to the service are easier to reuse, as no database binding is required.

5. Might this service be used by external clients? For example, an entity-

oriented

service associated with a catalog could be made available to both internal

and

external users.

6. Are different users of the service likely to have different non-functional

requirements? If they do, then more than one version of a service should

perhaps be

implemented.

The answers to these questions help you select and refine abstractions that

can be

implemented as services. However, there is no formulaic way of deciding

which are

the best services. You need to use your experience and business knowledge

to decide

on what are the most appropriate services.

The output of the service selection process is a set of identified services

and associated requirements for these services. The functional service

requirements should

define what the service should do. The non-functional requirements

should define

the security, performance, and availability requirements of the service.

To help you understand the process of service candidate identification and

implementation, consider the following example:

A company, which sells computer equipment, has arranged special prices for

approved configurations for some large customers. To facilitate automated

ordering, the company wishes to produce a catalog service that will allow

customers to select the equipment that they need. Unlike a consumer catalog,

orders are not placed directly through a catalog interface. Instead, goods are

ordered through the web-based procurement system of each company that

accesses the catalog as a web service. The reason for this is that large compa-

nies usually have their own budgeting and approval procedures for orders that

must be followed when an order is placed.

The catalog service is an example of an entity-oriented service, where the

underlying resource is the catalog. The functional catalog service

requirements are as follows: 1. A specific version of the catalog shall be

provided for each user company. This

shall include the approved configurations and equipment that may be

ordered by

536 Chapter 18 Service-oriented software engineering

employees of the customer company and the equipment prices that have

been

agreed to with that company.

2. The catalog shall allow a customer employee to download a version of

the catalog for offline browsing.

3. The catalog shall allow users to compare the specifications and prices of

up to

six catalog items.

4. The catalog shall provide browsing and search facilities for users.

5. Users of the catalog shall be able to discover the predicted delivery date

for a given number of specific catalog items.

6. Users of the catalog shall be able to place “virtual orders” where the

items

required will be reserved for them for 48 hours. Virtual orders must be

con-

firmed by a real order placed by a procurement system. The real order

must be

received within 48 hours of the virtual order.

In addition to these functional requirements, the catalog has a number of

non-

functional requirements:

1. Access to the catalog service shall be restricted to employees of

accredited

organizations.

2. The prices and configurations offered to each customer shall be

confidential,

and access to these shall only be provided to employees of that customer.

3. The catalog shall be available without disruption of service from 0700

GMT to

1100 GMT.

4. The catalog service shall be able to process up to 100 requests per

second peak load.

There is no non-functional requirement related to the response time of the

catalog

service. This depends on the size of the catalog and the expected number

of simulta-

neous users. As this is not a time-critical service, there is no need to

specify the required performance at this stage.

18.3.2 Service interface design

Once you have identified candidate services, the next stage in the service

engineer-

ing process is to design the service interfaces. This involves defining the

operations associated with the service and their parameters. If SOAP-based

services are used,

you have to design the input and output messages. If RESTful services are

used, you

have to think about the resources required and how the standard

operations should be used to implement the service operations.

The starting point for service interface design is abstract interface design.

where

you identify the entities and the operations associated with the service,

their inputs and

18.3 Service engineering 537

Operation

Description

MakeCatalog

Creates a version of the catalog tailored for a specific customer. Includes

an

optional parameter to create a downloadable PDF version of the catalog.

Lookup

Displays all of the data associated with a specified catalog item.

Search

Takes a logical expression and searches the catalog according to that

expression. It displays a list of all items that match the search expression.

Compare

Provides a comparison of up to six characteristics (e.g., price, dimensions,

processor speed, etc.) of up to four catalog items.

CheckDelivery

Returns the predicted delivery date for an item if ordered that day.

MakeVirtualOrder

Reserves the number of items to be ordered by a customer and provides

item

information for the customer’s own procurement system.

Figure 18.10 Catalog

operations

outputs, and the exceptions associated with these operations. You then

need to think about how this abstract interface is realized as SOAP-based

or RESTful services.

If you choose a SOAP-based approach, you have to design the structure of

the XML

messages that are sent and received by the service. The operations and

messages are the basis of an interface description written in WSDL. If you

choose a RESTful approach, you have to design how the service operations

map onto the RESTful operations.

Abstract interface design starts with the service requirements and defines

the

operation names and parameters. At this stage, you should also define the

exceptions that may arise when a service operation is invoked. Figure

18.10 shows the catalog

operations that implement the requirements. There is no need for these to

be speci-

fied in detail; you add detail at the next stage of the design process.

Once you have established an informal description of what the service

should do,

the next stage is to add more detail of the service inputs and outputs. I

have shown this for the catalog service in Figure 18.11, which extends the

functional description in Figure 18.10.

Defining exceptions and how these exceptions can be communicated to

service

users is particularly important. Service engineers do not know how their

services

will be used. It is usually unwise to make assumptions that service users

will have

completely understood the service specification. Input messages may be

incorrect,

so you should define exceptions that report incorrect inputs to the service

client. It is generally good practice in reusable component development to

leave all exception

handling to the user of the component. Service developers should not

impose their

views on how exceptions should be handled.

In some cases, a textual description of the operations and their inputs and

outputs

is all that is required. The detailed realization of the service is left as an

implementation decision. Sometimes, however, you need to have a more

detailed design, and a

detailed interface description can be specified in a graphical notation such

as the

UML or in a readable description format such as JSON. Figure 18.12,

which

describes the inputs and outputs for the getDelivery operation, shows how

you can

use the UML to describe the interface in detail.

538 Chapter 18 Service-oriented software engineering

Operation

Inputs

Outputs

Exceptions

MakeCatalog

mcIn

mcOut

mcFault

Company id

URL of the catalog for

Invalid company id

PDF-flag

that company

Lookup

lookIn

lookOut

lookFault

Catalog URL

URL of page with the

Invalid catalog

Catalog number

item information

number

Search

searchIn

searchOut

searchFault

Catalog URL

URL of web page with

Badly formed search

Search string

search results

string

Compare

compIn

compOut

compFault

Catalog URL

URL of page showing

Invalid company id

Entry attribute (up to 6)

comparison table

Invalid catalog number

Catalog number (up to 4)

Unknown attribute

CheckDelivery

cdIn

cdOut

cdFault

Company id

Expected delivery

Invalid company id

Catalog number

date

No availability

Number of items required

Zero items requested

MakeVirtualOrder

voIn

voOut

voFault

Company id

Catalog number

Invalid company id

Catalog number

Number of items

Invalid catalog

Number of items required

required

number

Predicted delivery date

Zero items requested

Unit price estimate

Total price estimate

Figure 18.11 Catalog

interface design

cdIn

size (cID) = 6

cID: string

size (catNum) = 10

catNum: string

numItems > 0

numItems: integer

cdOut

catNum: string

size (catNum) = 10

delivDate: date

delivDate > Today

Invalid company id

cdFault

errCode=1

Invalid catalog number

errCode: integer

errCode = 2

No availability

Figure 18.12 UML

errCode = 3

Zero items requested

definition of input and

errCode = 4

output messages

18.3 Service engineering 539

Notice how I have added detail to the description by annotating the UML

diagram

with constraints. These details define the length of the strings representing

the company and the catalog item, and specify that the number of items

must be greater than zero and that delivery must be after the current date.

The annotations also show

which error codes are associated with each possible fault.

The catalog service is an example of a practical service, which illustrates

that it is not always straightforward whether to choose a RESTful or a

SOAP-based approach

to service implementation. As an entity-based service, the catalog can be

represented as a resource, which suggests that a RESTful model is the right

one to use. However, operations on the catalog are not simple GET

operations, and you need to maintain

some state in an interaction session with the catalog. This suggests the use

of a SOAP-based approach. Such dilemmas are common in service

engineering, and usually

local circumstances (e.g., availability of expertise) are a major factor in

the decision of which approach to use.

To implement a set of RESTful services, you have to decide on the set of

resources

that will be used to represent the catalog and how the fundamental GET,

POST, and

PUT operations will operate on these resources. Some of these design

decisions are

straightforward:

1. There should be a resource representing a company-specific catalog.

This should

have a URL of the form <base catalog>/<company name> and should

be cre-

ated using a POST operation.

2. Each catalog item should have its own URL of the form <base

catalog>/<company name>/<item identifier>.

3. You use the GET operation to retrieve items. Lookup is implemented by

using the

URL of an item in a catalog as the GET parameter. Search is implemented

by using

GET with the company catalog as the URL and the search string as a query

parameter.

This GET operation returns a list of URLs of the items matching the search.

However, the Compare, CheckDelivery, and MakeVirtualOrder operations

are

more complex:

1. The

Compare operation can be implemented as a sequence of GET operations

to

retrieve the individual items, followed by a POST operation to create the

com-

parison table and a final GET operation to return this to the user.

2. The

CheckDelivery and MakeVirtualOrder operations require an additional

resource, representing a virtual order. A POST operation is used to create

this

resource with the number of items required. The company id is used to

auto-

matically fill in the order form, and the delivery date is calculated. The

resource

can then be retrieved using a GET operation.

You need to think carefully about how exceptions are mapped onto the

standard

http response codes such as a 404 code, meaning that a URL cannot be

retrieved.

I don’t have space to go into this issue here, but it adds a further level of

complexity to the service interface design.

540 Chapter 18 Service-oriented software engineering

Legacy system services

Legacy systems are old software systems that are used by an organization.

It may not be cost-effective to rewrite or replace these systems, and many

organizations would like to use them in conjunction with more modern

systems. One of the most important uses of services is to implement

“wrappers” for legacy systems that provide access to a system’s functions

and data. These systems can then be accessed over the web and integrated

with other applications.

http://software-engineering-book.com/web/legacy-services

For SOAP-based services, the realization process, in this case, is simpler as

the

logical interface design can be translated automatically into WSDL. Most

program-

ming environments that support service-oriented development (e.g., the

ECLIPSE

environment) include tools that can translate a logical interface

description into its corresponding WSDL representation.

18.3.3 Service implementation and deployment

Once you have identified candidate services and designed their interfaces,

the final stage of the service engineering process is service

implementation. This implementation may involve programming the

service using a language such as Java or C#.

Both of these languages include libraries with extensive support for

developing

SOAP-based and RESTful services.

Alternatively, you can implement services by creating service interfaces to

existing components or legacy systems. Software assets that have already

proved to be useful

can therefore be made available for reuse. In the case of legacy systems, it

may mean that the system functionality can be accessed by new

applications. You can also develop new services by defining compositions

of existing services, as I explain in Section 18.4.

Once a service has been implemented, it then has to be tested before it is

deployed.

This involves examining and partitioning the service inputs (as explained

in Chapter 8), creating input messages that reflect these input

combinations, and then checking that the outputs are expected. You

should always try to generate exceptions during the test to check that the

service can cope with invalid inputs. For SOAP-based services, testing

tools are available that allow services to be examined and tested, and that

generate tests from a WSDL specification. However, these tools can only

test the conformity of the

service interface to the WSDL. They cannot test the service’s functional

behavior.

Service deployment, the final stage of the process, involves making the

service

available for use on a web server. Most server software makes this

operation straightforward. You install the file containing the executable

service in a specific directory.

It then automatically becomes available for use.

If the service is intended to be available within a large organization or as a

pub-

licly available service, you then have to provide documentation for

external service users. Potential users can then decide if the service is

likely to meet their needs and

18.4 Service composition 541

if they can trust you, as a service provider, to deliver the service reliably

and securely.

Information that you may include in a service description might be:

1. Information about your business, contact details, and so on. This is

important

for trust reasons. External users of a service have to be confident that it

will not behave maliciously. Information about the service provider allows

users to

check their credentials with business information agencies.

2. An informal description of the functionality provided by the service.

This helps potential users to decide if the service is what they want.

3. A description of how to use the service. For simple services, this can be

an

informal textual description that explains the input and output

parameters. For

more complex SOAP-based services, the WSDL description may be

published.

4. Subscription information that allows users to register for information

about

updates to the service.

A general difficulty with service specifications is that the functional

behavior of

the service is usually specified informally, as a natural language

description. Natural language descriptions are easy to read, but they are

subject to misinterpretation. To address this problem, there has been

extensive research on using ontologies and

ontology languages for specifying service semantics by marking up the

service with

ontology information (W3C 2012). However, ontology-based specification

is com-

plex and not widely understood. Consequently, it has not been widely

used.

18.4 Service composition

The underlying principle of service-oriented software engineering is that

you compose and configure services to create new, composite services.

These may be integrated

with a user interface implemented in a browser to create a web

application, or they

may be used as components in some other service composition. The

services involved

in the composition may be specially developed for the application,

business services developed within a company, or services from an

external provider. Both RESTful and

SOAP-based services can be composed to create services with extended

functionality.

Many companies have converted their enterprise applications into service-

oriented

systems, where the basic application building block is a service rather than

a component. This allows for widespread reuse within the company. We

are now seeing the

emergence of interorganizational applications between trusted suppliers,

who use

each other’s services. The final realization of the long-term vision of

service-oriented systems will rely on the development of a “services

market,” where services are

bought from trusted external suppliers.

Service composition may be used to integrate separate business processes

to pro-

vide an integrated process offering more extensive functionality. Say an

airline wishes to develop a travel aggregation service that provides a

complete vacation package for

542 Chapter 18 Service-oriented software engineering

Book

Book

Arrange

Browse

Book

flights

hotel

car or taxi

attractions

attractions

Dates/preferences

Arrival/departure

Figure 18.13 Vacation

dates/times

package workflow

Hotel location

travelers. In addition to booking their flights, travelers can also book

hotels in their preferred location, arrange car rental or book a taxi from

the airport, browse a travel guide, and make reservations to visit local

attractions. To create this application, the airline composes its own

booking service with services offered by a hotel booking agency, rental car

and taxi companies, and reservation services offered by owners of local

attractions.

The end result is a single service that integrates the services from different

providers.

You can think of this process as a sequence of separate steps, as shown in

Figure

18.13. Information is passed from one step to the next. For example, the

rental car

company is informed of the time that the flight is scheduled to arrive. The

sequence of steps is called a workflow—a set of activities ordered in time,

with each activity carrying out some part of the work. A workflow is a

model of a business process; that is, it sets out the steps involved in

reaching a particular goal that is important for a business.

In this case, the business process is the vacation booking service, offered

by the airline.

Workflow is a simple idea, and the above scenario of booking a vacation

seems to

be straightforward. In practice, service composition is usually more

complex than

this simple model implies. You have to consider the possibility of service

failure and include exception management to handle these failures. You

also have to take into

account nonstandard demands made by users of the application. For

example, say a

traveler was disabled and required a wheelchair to be rented and

delivered to the

airport. This would require extra services to be implemented and

composed, with

additional steps added to the workflow.

When designing a travel aggregation service, you must be able to cope

with situ-

ations where the normal execution of one of the services results in an

incompatibility with some other service execution. For example, say a

flight is booked to leave on

June 1 and to return on June 7. The workflow then proceeds to the hotel

booking

stage. However, the resort is hosting a major convention until June 2, so

no hotel

rooms are available. The hotel booking service reports this lack of

availability. This is not a failure; lack of availability is a common

situation.

You therefore have to “undo” the flight booking and pass the information

about

lack of availability back to the user. He or she then has to decide whether

to change the dates or the resort. In workflow terminology, this is called a

compensation action.

Compensation actions are used to undo actions that have already been

completed but

that must be changed as a result of later workflow activities.

The process of designing new services by reusing existing services is a

process of

software design with reuse (Figure 18.13). Design with reuse inevitably

involves

requirements compromises. The “ideal” requirements for the system have

to be mod-

ified to reflect the services that are actually available, whose costs fall

within budget and whose quality of service is acceptable.

18.4 Service composition 543

Formulate

Discover

Select

Refine

Create

outline

Test

services

services

workflow

workflow

workflow

program

service

Workflow

Service list

Service

Workflow

Executable

Deployable

design

specifications

design

workflow

service

Figure 18.14 Service

construction by

I have shown the six key stages in the process of system construction by

composi-

composition

tion in Figure 18.14:

1. Formulate outline workflow In this initial stage of service design, you use

the requirements for the composite service as a basis for creating an

“ideal” service

design. You should create a fairly abstract design at this stage, with the

intention of adding details once you know more about available services.

2. Discover services During this stage of the process, you look for existing

services to include in the composition. Most service reuse is within

enterprises, so

this may involve searching local service catalogs. Alternatively, you may

search

the services offered by trusted service providers, such as Oracle and

Microsoft.

3. Select possible services From the set of possible service candidates that

you have discovered, you then select possible services that can implement

workflow

activities. Your selection criteria will obviously include the functionality of

the

services offered. They may also include the cost of the services and the

quality

of service (responsiveness, availability, etc.) offered.

4. Refine workflow On the basis of information about the services that you

have selected, you then refine the workflow. This involves adding detail to

the abstract description and perhaps adding or removing workflow

activities. You may then repeat the service discovery and selection stages.

Once a stable set of services has been chosen and the final workflow

design established, you move on to the next stage in the process.

5. Create workflow program During this stage, the abstract workflow design

is transformed to an executable program and the service interface is

defined. You

can implement workflow programs using a programming language, such

as Java

or C#, or by using a workflow language, such as BPMN (explained below).

This

stage may also involve the creation of web-based user interfaces to allow

the

new service to be accessed from a web browser.

6. Test completed service or application The process of testing the completed,

composite service is more complex than component testing in situations

where

external services are used. I discuss testing issues in Section 18.4.2.

This process assumes that existing services are available for composition.

If you

rely on external information that is not available through a service

interface, you

may have to implement these services yourself. This usually involves a

“screen

544 Chapter 18 Service-oriented software engineering

scraping” process where your program extracts information from the

HTML text of

web pages that are sent to a browser for rendering.

18.4.1 Workflow design and implementation

Workflow design involves analyzing existing or planned business processes

to

understand the tasks involved and how these tasks exchange information.

You then

define the new business process in a workflow design notation. This sets

out the

stages involved in enacting the process and the information that is passed

between

the different process stages. However, existing processes may be informal

and

dependent on the skills and ability of the people involved. There may be

no “normal”

way of working or process definition. In such cases, you have to use your

knowledge

of the current process to design a workflow that achieves the same goals.

Workflows represent business process models. They are graphical models

that are

written using UML activity diagrams or BPMN, the Business Process

Modeling Notation

(White and Miers 2008; OMG 2011). I use BPMN for the examples in this

chapter. If

you use SOAP-based services, it is possible to convert BPMN workflows

automatically

into WS-BPEL, an XML-based workflow language. This is conformant with

other web

service standards such as SOAP and WSDL. RESTful services may be

composed within

a program in a standard programming language such as Java.

Alternatively, a composi-

tion language used for service mashups may be used (Rosenberg et al.

2008).

Figure 18.15 is an example of a simple BPMN model of part of the

vacation pack-

age scenario, shown in Figure 18.14. The model shows a simplified

workflow for

hotel booking and assumes the existence of a Hotels service with

associated opera-

tions called GetRequirements, CheckAvailability, ReserveRooms,

NoAvailability,

ConfirmReservation, and CancelReservation. The process involves getting

require-

ments from the customer, checking room availability, and then, if rooms

are availa-

ble, making a booking for the required dates.

This model introduces some of the core concepts of BPMN that are used to

create

workflow models:

1. Rectangles with rounded corners represent activities. An activity can be

exe-

cuted by a human or by an automated service.

2. Circles represent discrete events. An event is something that happens

during a

business process. A simple circle is used to represent a starting event and a

darker circle to represent an end event. A double circle (not shown) is

used to

represent an intermediate event. Events can be clock events, thus allowing

work-

flows to be executed periodically or timed out.

3. A diamond is used to represent a gateway. A gateway is a stage in the

process

where some choice is made. For example, in Figure 18.15, a choice is

made on

the basis of whether or not rooms are available.

4. A solid arrow shows the sequence of activities; a dashed arrow

represents mes-

sage flow between activities. In Figure 18.15, these messages are passed

between the hotel booking service and the customer.

18.4 Service composition 545

Retry

Cancel

No rooms

Hotels.

NoAvailability

Hotels.

Hotels.

GetRequirements

CheckAvailability

Hotels.

Rooms OK

ReserveRooms

Hotels.

ConfirmReservation

Customer

Figure 18.15 A

fragment of a hotel

These key features are enough to describe most workflows. However,

BPMN

booking workflow

includes many additional features that I don’t have space to describe here.

These add information to a business process description that allows it to

be automatically translated into an executable service.

Figure 18.15 shows a process that is enacted in a single organization, the

com-

pany that provides a booking service. However, the key benefit of a

service-oriented approach is that it supports interorganizational

computing. This means that a computation involves processes and services

in different companies. This process is represented in BPMN by developing

separate workflows for each of the organizations

involved with interactions between them.

To illustrate multiple workflow processes, I use a different example, drawn

from high-performance computing, where hardware is offered as a service.

Services are created to provide access to high-performance computers to a

geo-

graphically distributed user community. In this example, a vector-

processing com-

puter (a machine that can carry out parallel computations on arrays of

values) is

offered as a service (VectorProcService) by a research laboratory. This is

accessed

through another service called SetupComputation. These services and their

interactions are shown in Figure 18.16.

In this example, the workflow for the SetupComputation service asks for

access

to a vector processor and, if a processor is available, establishes the

computation

required and downloads data to the processing service. Once the

computation is

complete, the results are stored on the local computer. The workflow for

VectorProcService includes the following steps:

Check if a processor is available

Allocate resources for the computation

Initialize the system

546 Chapter 18 Service-oriented software engineering

Restart

No processor

Fail

Request

Set up job

Download

Start

processor

parameters

data

computation

OK

OK

SetupComputation

Store

Report

results

completion

Check

Allocate

Initialize

Compute

Availability

resources

Return

results

VectorProcService

Figure 18.16 Interacting Carry out the computation

workflows

Return the results to the client service

In BPMN terms, the workflow for each organization is represented in a

separate

pool. It is shown graphically by enclosing the workflow for each

participant in the

process in a rectangle, with the name written vertically on the left edge.

The work-

flows in each pool are coordinated by exchanging messages. In situations

where

different parts of an organization are involved in a workflow, pools are

divided into named “lanes.” Each lane shows the activities in that part of

the organization.

Once a business process model has been designed, it has to be refined

depending

on the services that have been discovered. As I suggested in the discussion

of Figure 18.14, the model may go through a number of iterations until a

design that allows the maximum possible reuse of available services has

been created.

Once the final design is available, you can then develop the final service-

oriented

system. This involves implementing services that are not available for

reuse and converting the workflow model into an executable program. As

services are implementa-

tion-language independent, new services can be written in any language.

The workflow model may be automatically processed to create an

executable WS-BPEL model if

SOAP-based services are used. Alternatively, if RESTful services are used,

the work-

flow may be manually programmed, with the model acting as a program

specification.

18.4.2 Testing service compositions

Testing is important in all system development processes as it

demonstrates that a

system meets its functional and non-functional requirements and detects

defects that

18.4 Service composition 547

have been introduced during the development process. Many testing

techniques,

such as program inspections and coverage testing, rely on analysis of the

software

source code. However, if you use services from an external provider, you

will not

have access to the source code of the service implementations. You cannot

therefore

use “white box” testing techniques that rely on the source code of the

system.

As well as problems of understanding the implementation of the service,

testers

may also face further difficulties when testing service compositions:

1. External services are under the control of the service provider rather

than the

user of the service. The service provider may withdraw these services at

any time

or may make changes to them, which invalidates any previous application

test-

ing. These problems are handled in software components by maintaining

differ-

ent versions of the component, but service versions are not normally

supported.

2. If services are dynamically bound, an application may not always use

the same

service each time that it is executed. Therefore, tests may be successful

when an

application is bound to a particular service, but it cannot be guaranteed

that that

service will be used during an actual execution of the system. This

problem has

been one reason why dynamic binding has not been widely used.

3. The non-functional behavior of a service is not simply dependent on

how it is used by the application that is being tested. A service may

perform well during testing

because it is not operating under a heavy load. In practice, the observed

service

behavior may be different because of the demands made by other service

users.

4. The payment model for services could make service testing very

expensive.

There are different possible payment models: Some services may be freely

available, some may be paid for by subscription, and others may be paid

for on

a per-use basis. If services are free, then the service provider will not wish

them

to be loaded by applications being tested; if a subscription is required,

then a

service user may be reluctant to enter into a subscription agreement

before test-

ing the service. Similarly, if the usage is based on payment for each use,

service

users may find the cost of testing to be prohibitive.

5. I have discussed the notion of compensation actions that are invoked

when an

exception occurs and previous commitments that have been made (such as

a

flight reservation) have to be revoked. There is a problem in testing such

actions

as they may depend on the failure of other services. Simulating the failure

of

these services during the testing process is usually difficult.

These problems are particularly acute when external services are used.

They are

less serious when services are used within the same company or where

cooperating

companies trust services offered by their partners. In such cases, source

code may

be available to guide the testing process, and payment for services is

unlikely to

be a problem. Resolving these testing problems and producing guidelines,

tools,

and techniques for testing service-oriented applications remains an

important

research issue.

548

548 Chapter 18 Service-oriented

vice-oriented software

softwar engineering

K e y P o i n ts

Service-oriented architecture is an approach to software engineering

where reusable, standardized services are the basic building blocks for

application systems.

Services may be implemented within a service-oriented architecture

using a set of XML-based web service standards. These include standards

for service communication, interface definition, and service enactment in

workflows.

Alternatively, a RESTful architecture may be used, which is based on

resources and standard operations on these resources. A RESTful approach

uses the http and https protocols for service communication and maps

operations on the standard http verbs POST, GET, PUT, and DELETE.

Services may be classified as utility services that provide a general-

purpose functionality, business services that implement part of a business

process, or coordination services that coordinate the execution of other

services.

The service engineering process involves identifying candidate services

for implementation, defining the service interface, and implementing,

testing, and deploying the service.

The development of software using services is based on the idea that

programs are created by composing and configuring services to create new

composite services and systems.

Graphical workflow languages, such as BPMN, may be used to describe

a business process and the services used in that process. These languages

can describe interactions between the organizations that are involved.

F u RT h E R R E a d i n g

There is an immense amount of tutorial material on the web covering all

aspects of web services.

However, I found the book by Thomas Erl to be the best overview and

description of services and service standards. Erl includes some discussion

of software engineering issues in service-oriented computing. He has also

written more recent books on RESTful services.

Service-Oriented Architecture: Concepts, Technology and Design. Erl has

written a number of books on service-oriented systems covering both SOA

and RESTful architectures. In this book, Erl discusses SOA and web service

standards but mostly concentrates on discussing how a service-oriented

approach may be used at all stages of the software process. (T. Erl,

Prentice-Hall, 2005).

“Service-oriented architecture.” This is a good, readable introduction to

SOA. (Various authors, 2008)

http://msdn.microsoft.com/en-us/library/bb833022.aspx

“RESTful Web Services: The Basics.” A good introductory tutorial on the

RESTful approach and RESTful services. (A. Rodriguez, 2008). https://

www.ibm.com/developerworks/webservices/library/ws-restful/

Service Design Patterns: Fundamental Design Solutions for SOAP/WSDL, and

RESTful Web Services.

This is a more advanced text for developers who wish to use web services

in enterprise applications.

It describes a number of common problems and abstract web service

solutions to these problems.

(R. Daigneau, Addison-Wesley, 2012).

1.1 Professional softwar

Chapter e

18dev

elopment

Exercises 549

“Web Services Tutorial.” This is an extensive tutorial on all aspects of

service-oriented architecture, web services, and web service standards,

written by people involved in the development of these standards. Very

useful if you need a detailed understanding of the standards. (W3C

schools, 1999–2014)

http://www.w3schools.com/webservices/default.asp

W E b S i T E

PowerPoint slides for this chapter:

www.pearsonglobaleditions.com/Sommerville

Links to supporting videos:

http://software-engineering-book.com/videos/software-reuse/

E x E R C i S E S

18.1. Why is it important to define exceptions in service engineering?

18.2. Standards are fundamental to service-oriented architectures, and it

was believed that standards conformance was essential for successful

adoption of a service-based approach. However, RESTful services, which

are increasingly widely used, are not standards-based. Discuss why you

think this change has occurred and whether or not you think that the lack

of standards will inhibit the development and takeup of RESTful services.

18.3. Extend Figure 18.5 to include WSDL definitions for MaxMinType

and InDataFault. The temperatures should be represented as integers,

with an additional field indicating whether the temperature is in degrees

Fahrenheit or degrees Celsius. InDataFault should be a simple type

consisting of an error code.

18.4. Suggest how the SimpleInterestCalculator service could be

implemented as a RESTful service.

18.5. What is a workflow? List out the key stages in the process of system

construction by composition.

18.6. Design possible input and output messages for the services shown in

Figure 18.13. You may specify these in the UML or in XML.

18.7. Giving reasons for your answer, suggest two important types of

application where you would not recommend the use of service-oriented

architecture.

18.8. Explain what is meant by a “compensation action” and, using an

example, show why these actions may have to be included in workflows.

550

550 Chapter 18 Service-oriented

vice-oriented software

softwar engineering

18.9. For the example of the vacation package reservation service, design

a workflow that will book ground transportation for a group of passengers

arriving at an airport. They should be given the option of booking either a

taxi or a hire car. You may assume that the taxi and rental car companies

offer web services to make a reservation.

18.10. Using an example, explain in detail why the thorough testing of

services that include compensation actions is difficult.

R E F E R E n C E S

Erl, T. 2004. Service-Oriented Architecture: A Field Guide to Integrating XML

and Web Services.

Upper Saddle River, NJ: Prentice-Hall.

. 2005. Service-Oriented Architecture: Concepts, Technology and Design.

Upper Saddle River, NJ: Prentice-Hall.

Fielding, R. 2000. “Representational State Transfer.” Architectural Styles

and the Design of Network-Based Software Architecture. https://

www.ics.uci.edu/~fielding/pubs/. . ./fielding_dissertation.pdf

Lovelock, C, S Vandermerwe, and B Lewis. 1996. Services Marketing.

Englewood Cliffs, NJ.: Prentice-Hall.

Newcomer, E., and G. Lomow. 2005. Understanding SOA with Web Services.

Boston: Addison-Wesley.

OMG. 2011. “Documents Associated with Business Process Model and

Notation (BPMN) Version

2.0.” http://www.omg.org/spec/BPMN/2.0/

Pautasso, C., O. Zimmermann, and F. Leymann. 2008. “RESTful Web

Services vs. ‘Big’ Web Services: Making the Right Architectural Decision.”

In Proc. WWW 2008, 805–14. Beijing, China.

doi:10.1145/1367497.1367606.

Richardson, L., and S. Ruby. 2007. RESTful Web Services. Sebastopol, CA:

O’Reilly Media Inc.

Rosenberg, F., F. Curbera, M. Duftler, and R. Khalaf. 2008. “Composing

RESTful Services and Collaborative Workflows: A Lightweight Approach.”

IEEE Internet Computing 12 (5): 24–31. doi:10.1109/

MIC.2008.98.

W3C. 2012. “OWL 2 Web Ontology Language.” http://www.w3.org/TR/

owl2-overview/

. 2013. “Web of Services.” http://www.w3.org/standards/webofservices/

White, S. A., and D. Miers. 2008. BPMN Modeling and Reference Guide:

Understanding and Using BPMN. Lighthouse Point, FL. USA: Future

Strategies Inc.

19

Systems engineering

Objectives

The objectives of this chapter are to explain why software engineers

should understand systems engineering and to introduce the most

important systems engineering processes. When you have read this

chapter, you will:

know what is meant by a sociotechnical system and understand why

human, social, and organizational issues affect the requirements and

design of software systems;

understand the idea of conceptual design and why it is an essential

first stage in the systems engineering process;

know what is meant by system procurement and understand why

different

system procurement processes are used for different types of system;

know about the key systems engineering development processes and

their relationships.

Contents

19.1 Sociotechnical systems

19.2 Conceptual design

19.3 System procurement

19.4 System development

19.5 System operation and evolution

552 Chapter 19 Systems engineering

A computer only becomes useful when it includes both software and

hardware.

Without hardware, a software system is an abstraction—simply a

representation of

some human knowledge and ideas. Without software, a hardware system

is a set of

inert electronic devices. However, if you put them together to form a

computer sys-

tem, you create a machine that can carry out complex computations and

deliver the

results of these computations to its environment.

This illustrates one of the fundamental characteristics of a system: It is

more than the sum of its parts. Systems have properties that only become

apparent when their components are integrated and operate together.

Furthermore, systems are developed to support human activities—work,

entertainment, communication, protection of people and

the environment, and so on. They interact with people, and their design is

influenced by human and organizational concerns. Hardware, human,

social, and organizational factors have to be taken into account when

developing all professional software systems.

Systems that include software fall into two categories:

1. Technical computer-based systems are systems that include hardware and

software components but not procedures and processes. Examples of

technical

systems include televisions, mobile phones, and other equipment with

embed-

ded software. Applications for PCs, computer games, and mobile devices

are

also technical systems. Individuals and organizations use technical systems

for a

particular purpose, but knowledge of this purpose is not part of the

technical

system. For example, the word processor I am using (Microsoft Word) is

not

aware that it is being used to write a book.

2 Sociotechnical systems: include one or more technical systems but,

crucially, also people, who understand the purpose of the system, within

the system itself.

Sociotechnical systems have defined operational processes, and people

(the opera-

tors) are inherent parts of the system. They are governed by

organizational policies and rules and may be affected by external

constraints such as national laws and regulatory policies. For example, this

book was created through a sociotechnical publishing system that includes

various processes (creation, editing, layout, etc.) and

technical systems (Microsoft Word and Excel, Adobe Illustrator, Indesign,

etc.).

Systems engineering (White et al. 1993; Stevens et al. 1998; Thayer 2002)

is the

activity of designing entire systems, taking into account the characteristics

of hardware, software, and human elements of these systems. Systems

engineering includes everything to do with procuring, specifying,

developing, deploying, operating, and maintaining both technical and

sociotechnical systems. Systems engineers have to consider the capabilities

of hardware and software as well as the system’s interactions with users

and its environment. They must think about the system’s services, the

constraints under which the system must be built and operated, and the

ways in which the system is used.

In this chapter, my focus is on the engineering of large and complex

software-

intensive systems. These are “enterprise systems,” that is, systems that are

used to support the goals of a large organization. Enterprise systems are

used by government and the military services as well as large companies

and other public bodies.

Chapter 19 Systems engineering 553

Conceptual design

System vision

and features

Procurement

Equipment and

software updates

Development

Outline

requirements

System

Deployment

evolution

Figure 19.1 Stages of

Operation

systems engineering

User information

They are sociotechnical systems that are influenced by the ways that the

organization works and by national and international rules and

regulations. They may be made up

of a number of separate systems and are distributed systems with large-

scale data-

bases. They have a long lifetime and are critical for the operation of the

enterprise.

I believe that it is important for software engineers to know about systems

engi-

neering and to be active participants in systems engineering processes for

two reasons: 1. Software is now the dominant element in all enterprise

systems, yet many senior

decision makers in organizations have a limited understanding of

software. Software

engineers have to play a more active part in high-level systems decision

making if

the system software is to be dependable and developed on time and to

budget.

2. As a software engineer, it helps if you have a broader awareness of how

software interacts with other hardware and software systems, and the

human, social, and

organizational factors that affect the ways in which software is used. This

knowledge helps you understand the limits of software and to design

better software systems.

There are four overlapping stages (Figure 19.1) in the lifetime of large,

complex

systems:

1. Conceptual design This initial systems engineering activity develops the

concept of the type of system that is required. It sets out, in nontechnical

language,

the purpose of the system, why it is needed, and the high-level features

that

users might expect to see in the system. It may also describe broad

constraints,

such as the need for interoperability with other systems. These limit the

freedom

of systems engineers in designing and developing the system.

2. Procurement or acquisition during this stage, the conceptual design is

further developed so that information is available to make decisions about

the contract for the

system development. This may involve making decisions about the

distribution of

554 Chapter 19 Systems engineering

functionality across hardware, software, and operational processes. You

also make

decisions about which hardware and software has to be acquired, which

suppliers

should develop the system, and the terms and conditions of the supply

contract.

3. Development during this stage, the system is developed. development

processes include requirements definition, system design, hardware and

software

engineering, system integration, and testing. Operational processes are

defined,

and the training courses for system users are designed.

4. Operation At this stage, the system is deployed, users are trained, and

the system is brought into use. The planned operational processes usually

then have to

change to reflect the real working environment where the system is used.

Over

time, the system evolves as new requirements are identified. Eventually,

the

system declines in value, and it is decommissioned and replaced.

Figure 19.1 shows the interactions between these stages. The conceptual

design

activity is a basis for the system procurement and development but is also

used to

provide information to users about the system. development and

procurement over-

lap and further procurement during development, and operation may be

needed as

new equipment and software become available. Once the system is

operational,

requirements changes are inevitable; implementing these changes requires

further

development and, perhaps, software and hardware procurement.

decisions made at any one of these stages may have a profound influence

on the

other stages. design options may be restricted by procurement decisions

on the

scope of the system and on its hardware and software. Human errors made

during

the specification, design, and development stages may mean that faults are

intro-

duced into the system. A decision to limit testing for budget reasons may

mean that

faults are not discovered before a system is put into use. during operation,

errors in configuring the system for deployment may lead to problems in

using the system.

decisions made during the original procurement may be forgotten when

system

changes are proposed. This may lead to unforeseen consequences arising

from the

implementation of the changes.

An important difference between systems and software engineering is the

involvement of a range of professionals throughout the lifetime of the

system.

These include engineers who may be involved in hardware and software

design,

system end-users, managers who are concerned with organizational issues,

and

experts in the system’s application domain. For example, engineering the

insulin

pump system introduced in Chapter 1 requires experts in electronics,

mechanical

engineering, software, and medicine.

For very large systems, an even wider range of expertise may be required.

Figure 19.2 illustrates the technical disciplines that may be involved in the

procurement and development of a new system for air traffic

management. Architects and

civil engineers are involved because new air traffic management systems

usually

have to be installed in a new building. Electrical and mechanical engineers

are

involved to specify and maintain the power and air conditioning.

Electronic engi-

neers are concerned with computers, radars, and other equipment.

Ergonomists

Chapter 19 Systems engineering 555

Electrical

Electronic

Architecture

engineering

engineering

Civil

Systems

Software

engineering

engineering

engineering

Figure 19.2 Professional

Mechanical

User interface

Ergonomics

disciplines involved in

engineering

design

ATC systems engineering

design the controller workstations and software engineers, and user

interface designers are responsible for the software in the system.

The involvement of a range of professional disciplines is essential because

of the

different types of components in complex systems. However, differences

and mis-

understandings between disciplines can lead to inappropriate design

decisions.

These poor decisions can delay the system’s development or make it less

suitable for its intended purpose. There are three reasons why there may

be misunderstandings

or other differences between engineers with different backgrounds:

1. different professional disciplines often use the same words, but these

words

do not always mean the same thing. Consequently, misunderstandings are

common in discussions between engineers from different backgrounds. If

these are not discovered and resolved during system development, they

can

lead to errors in delivered systems. For example, an electronic engineer

may

know a bit about C programming but may not understand that a method

in

Java is like a function in C.

2. Each discipline makes assumptions about what other disciplines can or

cannot

do. These assumptions are often based on an inadequate understanding of

what

is possible. For example, an electronic engineer may decide that all signal

pro-

cessing (a computationally intensive task) should be done by software to

sim-

plify the hardware design. However, this may mean significantly greater

software effort to ensure that the system processor can cope with the

amount of

computation that is resolved.

3. disciplines try to protect their professional boundaries and may argue

for certain design decisions because these decisions will call for their

professional expertise.

Therefore, a software engineer may argue for a software-based door

locking sys-

tem in a building, although a mechanical, key-based system may be more

reliable.

My experience is that interdisciplinary working can be successful only if

enough

time is available for these issues to be discussed and resolved. This

requires regular face-to-face discussions and a flexible approach from

everyone involved in the systems engineering process.

556 Chapter 19 Systems engineering

19.1 Sociotechnical systems

The term system is universally used. We talk about computer systems,

operating systems, payment systems, the education system, the system of

government, and so on. These are all obviously quite different uses of the

word “system,” although they share the essential characteristic that,

somehow, the system is more than simply the sum of its parts.

Abstract systems, such as the system of government, are outside the scope

of this

book. I focus here on systems that include computers and software and

that have some specific purpose such as to enable communication, support

navigation, or maintain

medical records. A useful working definition of these types of system is as

follows: A system is a purposeful collection of interrelated components of

different kinds

that work together to deliver a set of services to the system owner and its

users.

This general definition can cover a very wide range of systems. For

example, a

simple system, such as a laser pointer, delivers an indication service. It

may include a few hardware components with a tiny control program in

read-only memory (ROM).

By contrast, an air traffic control system includes thousands of hardware

and soft-

ware components as well as human users who make decisions based on

information

from that computer system. It delivers a range of services, including

providing information to pilots, maintaining safe separation of planes,

utilizing airspace, and so on.

In all complex systems, the properties and behavior of the system

components are

inextricably intermingled. The successful functioning of each system

component

depends on the functioning of other components. Software can only

operate if the

processor is operational. The processor can only carry out computations if

the soft-

ware system defining these computations has been successfully installed.

Large-scale systems are often “systems of systems.” That is, they are made

up of

several separate systems. For example, a police command and control

system may

include a geographical information system to provide details of the

location of incidents. The same geographical information system may be

used in systems for trans-

port logistics and emergency command and control. Engineering systems

of systems

is an increasingly important topic in software engineering that I cover in

Chapter 20.

Large-scale systems are, with a few exceptions, sociotechnical systems,

which I

explained in Chapter 10. That is, they do not just include software and

hardware but also people, processes, and organizational policies.

Sociotechnical systems are

enterprise systems that are intended to help deliver a business purpose.

This purpose might be to increase sales, reduce material used in

manufacturing, collect taxes, maintain a safe airspace, and so on. Because

they are embedded in an organizational environment, the procurement,

development, and use of these systems are influenced by

the organization’s policies and procedures, as well as by its working

culture. The

users of the system are people who are influenced by the way the

organization is managed and by their interactions with other people inside

and outside of the organization.

The close relationships between sociotechnical systems and the

organizations that

use these systems means that it is often difficult to establish system

boundaries.

19.1 Sociotechnical systems 557

National laws and regulations

Organizational strategies and goals

Organizational culture

rganizational policies and rule

O

s

perational processe

O

s

Technical

system

Figure 19.3 Layered

structure of

sociotechnical systems

different people within the organization will see the boundaries of the

system in different ways. This is significant because establishing what is

and what is not in the scope of the system is important when defining the

system requirements.

Figure 19.3 illustrates this problem. The diagram shows a sociotechnical

system

as a set of layers, where each layer contributes, in some way, to the

functioning of the system. At the core is a software-intensive technical

system and its operational processes (shaded in Figure 19.3). Most people

would agree that these are both parts of the system. However, the system’s

behavior is influenced by a range of sociotechnical factors outside of the

core. Should the system boundary simply be drawn

around the core, or should it include other organizational levels?

Whether or not these broader sociotechnical considerations should be

considered to

be part of a system depends on the organization and its policies and rules.

If organizational rules and policies can be changed, then some people

might argue they should be part of the system. However, it is more

difficult to change organizational culture and even more challenging to

change strategy and goals. Only governments can change laws to

accommodate a system. Moreover, different stakeholders may have

different opinions on where the system boundaries should be drawn.

There are no simple answers to these questions, but they have to be

discussed and negotiated during the system design process.

Generally, large sociotechnical systems are used in organizations. When

you are

designing and developing sociotechnical systems, you need to understand,

as far as

possible, the organizational environment in which they will be used. If

you don’t, the systems may not meet business needs. Users and their

managers may reject the system or fail to use it to its full potential.

Figure 19.4 shows the key elements in an organization that may affect the

require-

ments, design, and operation of a sociotechnical system. A new system

may lead to

changes in some or all of these elements:

1. Process changes A new system may mean that people have to change the

way that they work. If so, training will certainly be required. If changes

are significant, or if they involve people losing their jobs, there is a danger

that the users

will resist the introduction of the system.

558 Chapter 19 Systems engineering

Policies

Processes

Systems

Jobs

Politics

Figure 19.4

Organizational elements

2. Job changes New systems may deskill the users in an environment or

cause them to change the way they work. If so, users may actively resist

the introduction of

the system into the organization. Professional staff, such as doctors or

teachers,

may resist system designs that require them to change their normal way of

work-

ing. The people involved may feel that their professional expertise is being

eroded and that their status in the organization is being reduced by the

system.

3. Organizational policies The proposed system may not be completely

consistent with organizational policies (e.g., on privacy). This may require

system changes,

policy changes, or process changes to bring the system and policies into

line.

4. Organizational politics The system may change the political power

structure in an organization. For example, if an organization is dependent

on a complex system, those who control access to that system have a great

deal of political power.

Alternatively, if an organization reorganizes itself into a different

structure, this may affect the requirements and use of the system.

Sociotechnical systems are complex systems, which means that it is

practically

impossible to have a complete understanding, in advance, of their

behavior. This

complexity leads to three important characteristics of sociotechnical

systems:

1. They have emergent properties that are properties of the system as a

whole, rather than associated with individual parts of the system.

Emergent properties depend

on both the system components and the relationships between them. Some

of

these relationships only come into existence when the system is integrated

from its components, so the emergent properties can only be evaluated at

that time.

Security and dependability are examples of important emergent system

properties.

2. They are nondeterministic, so that when presented with a specific input,

they

may not always produce the same output. The system’s behavior depends

on the

human operators, and people do not always react in the same way.

Furthermore,

use of the system may create new relationships between the system

components

and hence change its emergent behavior.

3. The system’s success criteria are subjective rather than objective. The

extent to which the system supports organizational objectives does not just

depend on the

system itself. It also depends on the stability of these objectives, the

relationships

19.1 Sociotechnical systems 559

Property

Description

Reliability

System reliability depends on component reliability, but unexpected

interactions

can cause new types of failure and therefore affect the reliability of the

system.

Repairability

This property reflects how easy it is to fix a problem with the system once

it has

been discovered. It depends on being able to diagnose the problem, access

the

components that are faulty, and modify or replace these components.

Security

The security of the system (its ability to resist attack) is a complex

property that cannot be easily measured. Attacks may be devised that

were not anticipated by

the system designers and so may defeat built-in safeguards.

Usability

This property reflects how easy it is to use the system. It depends on the

technical system components, its operators, and its operating

environment.

Volume

The volume of a system (the total space occupied) depends on how the

component assemblies are arranged and connected.

Figure 19.5 Examples

and conflicts between organizational objectives, and how people in the

organi-

of emergent properties

zation interpret these objectives. New management may reinterpret the

organi-

zational objectives that a system was designed to support so that a

“successful”

system may then be seen as no longer fit for its intended purpose.

Sociotechnical considerations are often critical in determining whether or

not a

system has successfully met its objectives. Unfortunately, taking these into

account is very difficult for engineers who have little experience of social

or cultural studies.

To help understand the effects of systems on organizations, various

sociotechnical

systems methodologies have been proposed. My paper on sociotechnical

systems

design discusses the advantages and disadvantages of these sociotechnical

design

methodologies (Baxter and Sommerville 2011).

19.1.1 Emergent properties

The complex relationships between the components in a system mean that

a system

is more than simply the sum of its parts. It has properties that are

properties of the system as a whole. These “emergent properties”

(Checkland 1981) cannot be attributed to any specific part of the system.

Rather, they only emerge once the system

components have been integrated. Some emergent properties, such as

weight, can be

derived directly from the subsystem properties. More often, however, they

emerge

from a combination of subsystem properties and subsystem relationships.

The

system property cannot be calculated directly from the properties of the

individual

system components. Examples of emergent properties are shown in Figure

19.5.

There are two types of emergent properties:

1. Functional emergent properties, when the purpose of a system only

emerges after its components are integrated. For example, a bicycle has

the functional property

of being a transportation device once it has been assembled from its

components.

560 Chapter 19 Systems engineering

2. Non-functional emergent properties, which relate to the behavior of the

system in its operational environment. Reliability, performance, safety,

and security are

examples of these properties. These system characteristics are critical for

computer-based systems, as failure to achieve a minimum defined level in

these

properties usually makes the system unusable. Some users may not need

some of

the system functions, so the system may be acceptable without them.

However,

a system that is unreliable or too slow is likely to be rejected by all its

users.

Emergent properties, such as reliability, depend on both the properties of

individual components and their interactions or relationships. For

example, the

reliability of a sociotechnical system is influenced by three things:

1. Hardware reliability What is the probability of hardware components

failing, and how long does it take to repair a failed component?

2. Software reliability How likely is it that a software component will

produce an incorrect output? Software failure is unlike hardware failure in

that software

does not wear out. Failures are often transient. The system carries on

working

after an incorrect result has been produced.

3. Operator reliability How likely is it that the operator of a system will

make an error and provide an incorrect input? How likely is it that the

software will fail

to detect this error and propagate the mistake?

Hardware, software, and operator reliability are not independent but

affect each other in unpredictable ways. Figure 19.6 shows how failures at

one level can be propagated to other levels in the system. Say a hardware

component in a system starts to go wrong. Hardware failure can

sometimes generate spurious signals that are outside

the range of inputs expected by the software. The software can then

behave unpre-

dictably and produce unexpected outputs. These may confuse and

consequently

cause stress in the system operator.

We know that people are more likely to make mistakes when they feel

stressed.

So a hardware failure may be the trigger for operator errors. These

mistakes can, in turn, lead to unexpected software behavior, resulting in

additional demands on the

processor. This could overload the hardware, causing more failures and so

on. Thus,

an initial, relatively minor, failure, can rapidly develop into a serious

problem that could lead to a complete shutdown of the system.

The reliability of a system depends on the context in which that system is

used.

However, the system’s environment cannot be completely specified, and it

is often

impossible for the system designers to limit the environment for

operational sys-

tems. different systems operating within an environment may react to

problems in

unpredictable ways, thus affecting the reliability of all of these systems.

For example, say a system is designed to operate at normal room

temperature.

To allow for variations and exceptional conditions, the electronic

components of

a system are designed to operate within a certain range of temperatures,

say, from

0 degrees to 40 degrees Celsius. Outside this temperature range, the

components will

19.1 Sociotechnical systems 561

Operation

Failure

Software

propagation

Failure

consequence

Initial

Figure 19.6 Failure

failure

Hardware

propagation

behave in an unpredictable way. Now assume that this system is installed

close to an air conditioner. If this air conditioner fails and vents hot gas

over the electronics, then the system may overheat. The components, and

hence the whole system may then fail.

If this system had been installed elsewhere in that environment, this

problem

would not have occurred. When the air conditioner worked properly,

there were no

problems. However, because of the physical closeness of these machines,

an unan-

ticipated relationship existed between them that led to system failure.

Like reliability, emergent properties such as performance or usability are

hard to

assess but can be measured after the system is operational. Properties such

as safety and security, however, are not directly measurable. Here, you are

not simply concerned with attributes that relate to the behavior of the

system but also with unwanted or unacceptable behavior.

A secure system is one that does not allow unauthorized access to its data.

Unfortunately, it is clearly impossible to predict all possible modes of

access and

explicitly forbid them. Therefore, it may only be possible to assess these

“shall not”

properties after the system is operational. That is, you only know that a

system is

insecure when someone manages to penetrate the system.

19.1.2 Non-determinism

A deterministic system is one that is absolutely predictable. If we ignore

issues of concurrency, software systems that run on reliable hardware are

deterministic. When

they are presented with a sequence of inputs they will always produce the

same

sequence of outputs. Of course, there is no such thing as completely

reliable hardware, but hardware is usually reliable enough to think of

hardware systems as deterministic.

People, on the other hand, are non-deterministic. When presented with

exactly

the same input (say a request to complete a task), their responses will

depend on

their emotional and physical state, the person making the request, other

people in the environment, and whatever else they are doing. Sometimes

they will be happy to do

the work, and, at other times, they will refuse; sometimes they will

perform a task

well, and sometimes they will do it badly.

Sociotechnical systems are nondeterministic partly because they include

people

and partly because changes to the hardware, software, and data in these

systems are

562 Chapter 19 Systems engineering

so frequent. The interactions between these changes are complex, and so

the behav-

ior of the system is unpredictable. Users do not know when and why

changes have

been made, so they see the system as nondeterministic.

For example, say a system is presented with a set of 20 test inputs. It

processes

these inputs and the results are recorded. At some later time, the same 20

test inputs are processed, and the results are compared to the previous

stored results. Five of

them are different. does this mean that there have been five failures? Or

are the differences simply reasonable variations in the system’s behavior?

You can only find

this out by looking at the results in more depth and making judgments

about the way

the system has handled each input.

Non-determinism is often seen as a bad thing, and it is felt that designers

should

try to avoid nondeterministic behavior wherever possible. In fact, in

sociotechnical systems, non-determinism has important benefits. It means

that the behavior of a

system is not fixed for all time but can change depending on the system’s

environ-

ment. For example, operators may observe that a system is showing signs

of failure.

Instead of using the system normally, they can change their behavior to

diagnose and recover from the detected problems.

19.1.3 Success criteria

Generally, complex sociotechnical systems are developed to tackle

“wicked prob-

lems” (Rittel and Webber 1973). A wicked problem is a problem that is so

complex

and that involves so many related entities that there is no definitive

problem specification. different stakeholders see the problem in different

ways, and no one has a

full understanding of the problem as a whole. The true nature of the

problem may

only emerge as a solution is developed.

An extreme example of a wicked problem is emergency planning to deal

with the

aftermath of an earthquake. No one can accurately predict where the

epicenter of an

earthquake will be, what time it will occur, or what effect it will have on

the local environment. It is impossible to specify in detail how to deal

with the problem.

System designers have to make assumptions, but understanding what is

required

emerges only after the earthquake has happened.

This makes it difficult to define the success criteria for a system. How do

you decide if a new system contributes to the business goals of the

company that paid for the system?

The judgment of success is not usually made against the original reasons

for procuring and developing the system. Rather, it is based on whether or

not the system is effective at the time it is deployed. As the business

environment can change very quickly, the business goals may have

changed significantly during the development of the system.

The situation is even more complex when there are multiple conflicting

goals that

are interpreted differently by different stakeholders. For instance, the

system on which the Mentcare system is based was designed to support

two separate business goals:

1. To improve the quality of care for sufferers from mental illness.

2. To improve the cost-effectiveness of treatments by providing managers

with

detailed reports of care provided and the costs of that care.

19.2 Conceptual design 563

Unfortunately, these proved to be conflicting goals because the

information that

was needed to satisfy the reporting goal meant that doctors and nurses

had to provide additional information, over and above the health records

that they normally maintained. This reduced the quality of care for

patients as it meant that clinical staff had less time to talk with them.

From a doctor’s perspective, this system was not

an improvement on the previous manual system, but from a manager’s

perspective,

it was.

Thus, any success criteria that are established in the early stages of the

systems

engineering process have to be regularly reconsidered during system

development

and use. You cannot evaluate these criteria objectively as they depend on

the sys-

tem’s effect on its environment and its users. A system may apparently

meet its

requirements as originally specified but be practically useless because of

changes in the environment where it is used.

19.2 Conceptual design

Once an idea for a system has been suggested, conceptual design is the

very first

thing that you do in the systems engineering process. In the conceptual

design phase, you take that initial idea, investigate its feasibility, and

develop it to create an overall vision of a system that could be developed.

You then have to describe the envisaged

system so that nonexperts, such as system users, senior company decision

makers, or

politicians, can understand what you are proposing.

There is an obvious overlap between conceptual design and requirements

engineering. As part of the conceptual design process, you have to imagine

how the

proposed system will be used. This may involve discussions with potential

users and

other stakeholders, focus groups, and observations of how existing systems

are used.

The goal of these activities is to understand how users work, what is

important to

them, and what practical constraints on the system there might be.

The importance of establishing a vision of a proposed system is rarely

mentioned

in the software design and requirements literature. However, this vision

has been

part of the systems engineering process for military systems for many

years. Fairley et al. (Fairley, Thayer, and Bjorke 1994) discuss the idea of

concept analysis and the documentation of the results of concept analysis

in a “Concept of Operations”

(ConOps) document. This idea of developing a ConOps document is now

widely

used for large-scale systems, and you can find many examples of ConOps

documents

on the web.

Unfortunately, as is so often the case with military and government

systems, good

ideas can become mired in bureaucracy and inflexible standards. This is

exactly what happened with ConOps, and a ConOps document standard

was proposed (IEEE,

2007). As Mostashari et al. say (Mostashari et al. 2012), this tends to lead

to long and unreadable documents, which do not serve their intended

purpose. They propose a

more agile approach to the development of a ConOps document with a

shorter and

more flexible document as the output of the process.

564 Chapter 19 Systems engineering

Concept formulation

Problem understanding

System proposal development

Feasibility study

System structure development

Figure 19.7 Conceptual

System vision document

design activities

I don’t like the term Concept of Operations partly because of its military

connota-tions and partly because I think that a conceptual design

document is not just about system operation. It should also present the

system engineer’s understanding of why

the system is being developed, an explanation of why the design proposals

are appro-

priate, and, sometimes, an initial organization for the system. As Fairley

says, “It should be organized to tell a story,” that is, written so that people

without a technical background can understand the proposals that are

being made.

Figure 19.7 shows activities that may be part of the conceptual design

process.

Conceptual design should always be a team process that involves people

from differ-

ent backgrounds. I was part of the conceptual design team for the digital

learning

environment, introduced in Chapter 1. For the digital learning

environment, the

design team included teachers, education researchers, software engineers,

system

administrators, and system managers.

Concept formulation is the first stage of the process where you try to

refine an

initial statement of needs and work out what type of system would be best

to meet

the needs of system stakeholders. Initially, we were tasked with proposing

an

intranet for information sharing across schools that was easier to use than

the cur-

rent system. However, after discussions with teachers, we discovered that

this was

not really what was required. The existing system was awkward to use,

but people

had found workarounds. What was really required was a flexible digital

learning

environment that could be adapted by adding subject and age-specific

tools and

content that are freely available on the Internet.

We discovered this because the concept formulation activity overlapped

with the

activity of problem understanding. To understand a problem, you need to

discuss

with users and other stakeholders how they do their work. You need to

find out what

is important to them, what are the barriers that stop them from doing

what they want to do, and their ideas of what changes are required. You

need to be open-minded (it

is their problem, not yours) and to be prepared to change your ideas when

the reality does not match your initial vision.

19.2 Conceptual design 565

In the system proposal development stage, the conceptual design team set

out

their ideas for alternative systems and these are the basis for a feasibility

study to decide which of the ideas are worth further development. In a

feasibility study, you should look at comparable systems that have been

developed elsewhere and technological issues (e.g., use of mobile devices)

that may affect use of the system. Then you need to assess whether or not

the system could be implemented using current

hardware and software technologies.

I have found that an additional useful activity is to develop an outline

structure

or architecture for the system. This activity is helpful both for making a

feasibility assessment and for providing a basis for more detailed

requirements engineering

and architectural design. Furthermore, as the majority of systems are now

assem-

bled from existing systems and components, an initial architecture means

that the

key parts of the system have been identified and can be procured

separately.

This approach is often better than procuring a system as a monolithic unit

from a

single supplier.

For the digital learning environment, we decided on a layered service

architecture

(shown in Figure 1.8). All components in the system should be considered

to be

replaceable services. In this way, users can replace a standard service with

their preferred alternative and so adapt the system to the ages and

interests of the students learning with the system.

All of these activities generate information that is used to develop the

system

vision document. This is a critical document that senior decision makers

use to

decide whether or not further development of the system should go ahead.

It is also

used to develop further documents such as a risk analysis and budget

estimate, which are also important inputs to the decision-making process.

Managers use the system vision document to understand the system; a

procure-

ment team uses it to define a tender document; and requirements

engineers use it

as a basis for refining the system requirements. Because these different

people

need different levels of detail, I suggest that the document should be

structured

into two parts:

1. A short summary for senior decision makers that presents the key points

of the

problem and the proposed system. It should be written so that readers can

imme-

diately see how the system will be used and the benefits that it will

provide.

2. A number of appendices that develop the ideas in more detail and that

can be

used in the system procurement and requirements engineering activities.

It is challenging to write a summary of the system vision inasmuch as the

readers

are busy people who are unlikely to have a technical background. I have

found that

using user stories is very effective, providing a tangible vision of system

use that nontechnical people can relate to. Stories should be short and

personalized and should be a feasible description of the use of the system,

as shown in Figure 19.8. There is another example of a user story from the

same system in Chapter 4 (Figure 4.9).

566 Chapter 19 Systems engineering

Digital art

Jill is an S2 pupil at a secondary school in Dundee. She has a smartphone

of her own, and the family has a shared Samsung tablet and a Dell laptop

computer. At school, Jill signs on to the school computer and is presented

with a personalized Glow+ environment, which includes a range of

services, some chosen by her teachers and some she has chosen herself

from the Glow app library.

She is working on a Celtic art project, and she uses Google to research a

range of art sites. She sketches out some designs on paper and then uses

the camera on her phone to photograph what she has done; she uploads

this using the school wifi to her personal Glow+ space. Her homework is

to complete the design and write a short commentary on her ideas.

At home, she uses the family tablet to sign on to Glow+, and she then

uses an artwork app to process her photograph and to extend the work,

add color, and so on. She finishes this part of the work, and to complete it

she moves to her home laptop to type up her commentary. She uploads

the finished work to Glow+ and sends a message to her art teacher that it

is available for review. Her teacher looks at the project in a free period

before Jill’s next art class using a school tablet, and, in class, she discusses

the work with Jill.

After the discussion, the teacher and Jill decide that the work should be

shared, and so they publish it to the school web pages that show examples

of students’ work. In addition, the work is included in Jill’s e-portfolio—

her record of schoolwork from age 3 to 18.

Figure 19.8 A user story

used in a system vision

document

User stories are effective because, as already noted, readers can relate to

them; in addition, they can show the capabilities of the proposed system

in an easily accessible way. Of course, these are only part of a system

vision, and the summary must

also include a high-level description of the basic assumptions made and

the ways in

which the system will deliver value to the organization.

19.3 System procurement

System procurement or system acquisition is a process whose outcome is a

decision

to buy one or more systems from system suppliers. At this stage, decisions

are made

on the scope of a system that is to be purchased, system budgets and

timescales, and high-level system requirements. Using this information,

further decisions are then

made on whether to procure a system, the type of system required, and

the supplier

or suppliers of the system. The drivers for these decisions are:

1. The replacement of other organizational systems If the organization has a

mixture of systems that cannot work together or that are expensive to

maintain, then

procuring a replacement system, with additional capabilities, may lead to

significant business benefits.

2. The need to comply with external regulations Increasingly, businesses are

regulated and have to demonstrate compliance with externally defined

regulations

(e.g., Sarbanes–Oxley accounting regulations in the United States).

Compliance

may require the replacement of noncompliant systems or the provision of

new

systems specifically to monitor compliance.

19.3 System procurement 567

3. External competition If a business needs to compete more effectively or

maintain a competitive position, managers may decide to buy new systems

to improve business efficiency or effectiveness. For military systems, the

need to improve capa-

bility in the face of new threats is an important reason for procuring new

systems.

4. Business reorganization Businesses and other organizations frequently

restructure with the intention of improving efficiency and/or customer

service. Reorganizations

lead to changes in business processes that require new systems support.

5. Available budget The budget that is available is an obvious factor in

determining the scope of new systems that can be procured.

In addition, new government systems are often procured to reflect political

changes and political policies. For example, politicians may decide to buy

new sur-

veillance systems, which they claim will counter terrorism. Buying such

systems

shows voters that they are taking action.

Large complex systems are usually engineered using a mixture of off-the-

shelf

and specially built components. They are often integrated with existing

legacy sys-

tems and organizational databases. When legacy systems and off-the-shelf

systems

are used, new custom software may be needed to integrate these

components. The

new software manages the component interfaces so that these components

can inter-

operate. The need to develop this “glueware” is one reason why the

savings from

using off-the-shelf components are sometimes not as great as anticipated.

Three types of systems or system components may have to be procured:

1. Off-the-shelf applications that may be used without change and that

need only

minimal configuration for use.

2. Configurable application or ERP systems that have to be modified or

adapted

for use either by modifying the code or by using inbuilt configuration

features,

such as process definitions and rules.

3. Custom systems that have to be specially designed and implemented for

use.

Each of these components tends to follow a different procurement process.

Figure 19.9

illustrates the main features of the procurement process for these types of

system. Key issues that affect procurement processes are:

1. Organizations often have an approved and recommended set of

application soft-

ware that has been checked by the IT department. It is usually possible to

buy or

acquire open-source software from this set directly without the need for

detailed

justification. For example, in the iLearn system, we recommended that

Wordpress should be made available for student and staff blogs. If

microphones

are needed, off-the-shelf hardware can be bought. There are no detailed

require-

ments, and the users adapt to the features of the chosen application.

2. Off-the-shelf components do not usually match requirements exactly,

unless the

requirements have been written with these components in mind.

Therefore, choosing

568 Chapter 19 Systems engineering

Off-the-shelf systems

Assess

Select

Conceptual

approved

Place order

system

design

applications

for system

required

Configurable systems

Conceptual

Market

Choose

Choose system

Negotiate

design

survey

system shortlist

supplier

contract

Refine

Modify

requirements

requirements

Custom systems

Conceptual

Define

Issue request

Choose system

Negotiate

design

requirements

for tender

supplier

contract

Modify

requirements

Figure 19.9 System

a system means that you have to find the closest match between the

system require-

procurement processes

ments and the facilities offered by off-the-shelf systems. ERP and other

large-scale application systems usually fall into this category. You may

then have to modify the requirements to fit in with the system

assumptions. This can have knock-on effects

on other subsystems. You also usually have an extensive configuration

process to

tailor and adapt the application or ERP system to the buyer’s working

environment.

3. When a system is to be built specially, the specification of requirements

is part of the contract for the system being acquired. It is therefore a legal

as well as a

technical document. The requirements document is critical, and

procurement

processes of this type usually take a considerable amount of time.

4. For public sector systems in particular, there are detailed rules and

regulations that affect the procurement of systems. For example, in the

European Union, all

public sector systems over a certain price must be open to tender by any

supplier

in Europe. This requires detailed tender documents to be drawn up and

the ten-

der to be advertised across Europe for a fixed period of time. Not only

does this

rule slow down the procurement process, it also tends to inhibit agile

develop-

ment. It forces the system buyer to develop requirements so that all

companies

have enough information to bid for the system contract.

5. For application systems that require change or for custom systems,

there is usu-

ally a contract negotiation period when the customer and supplier

negotiate the

terms and conditions for development of the system. Once a system has

been

19.3 System procurement 569

selected, you may negotiate with the supplier on costs, license conditions,

possible changes to the system, and other contractual issues. For custom

sys-

tems, negotiations are likely to involve payment schedules, reporting,

accept-

ance criteria, requirements change requests, and costs of system changes.

during

this process, requirements changes may be agreed that will reduce the

overall

costs and avoid some development problems.

Complex sociotechnical systems are rarely developed “in house” by the

buyer of

the system. Rather, external systems companies are invited to bid for the

systems

engineering contract. The customer’s business is not systems engineering,

so its

employees do not have the skills needed to develop the systems

themselves. For

complex hardware/software systems, it may be necessary to use a group of

suppliers,

each with a different type of expertise.

For large systems, such as an air traffic management system, a group of

suppliers

may form a consortium to bid for a contract. The consortium should

include all of the capabilities required for this type of system. For an ATC

system, this would include computer hardware suppliers, software

companies, peripheral suppliers, and suppliers of specialist equipment

such as radar systems.

Customers do not usually wish to negotiate with multiple suppliers, so the

contract

is usually awarded to a principal contractor, who coordinates the project.

The principal contractor coordinates the development of different

subsystems by subcontrac-

tors. The subcontractors design and build parts of the system to a

specification that is negotiated with the principal contractor and the

customer. Once completed, the principal contractor integrates these

components and delivers them to the customer.

decisions made at the procurement stage of the systems engineering

process are

critical for later stages in that process. Poor procurement decisions often

lead to problems such as late delivery of a system and development of

systems that are unsuited to their operational environment. If the wrong

system or the wrong supplier is chosen, then the technical processes of

system and software engineering become more complex.

For example, I studied a system “failure” where a decision was made to

choose an

ERP system because this would “standardize” operations across the

organization. These operations were very diverse, and it turned out there

were good reasons for this.

Standardization was practically impossible. The ERP system could not be

adapted to cope with this diversity. It was ultimately abandoned after

incurring costs of around £10 million.

decisions and choices made during system procurement have a profound

effect

on the security and dependability of a system. For example, if a decision is

made to procure an off-the-shelf system, then the organization has to

accept that they have no influence over the security and dependability

requirements of this system. System

security depends on decisions made by system vendors. In addition, off-

the-shelf

systems may have known security weaknesses or may require complex

configura-

tion. Configuration errors, where entry points to the system are not

properly secured, are a significant source of security problems.

On the other hand, a decision to procure a custom system means that a lot

of effort

must be devoted to understanding and defining security and dependability

requirements.

If a company has limited experience in this area, this is quite a difficult

thing to do. If the

570 Chapter 19 Systems engineering

required level of dependability as well as acceptable system performance

is to be

achieved, then the development time may have to be extended and the

budget increased.

Many bad procurement decisions stem from political rather than technical

causes.

Senior management may wish to have more control and so demand that a

single system

is used across an organization. Suppliers may be chosen because they have

a long-

standing relationship with a company rather than because they offer the

best technology.

Managers may wish to maintain compatibility with existing systems

because they feel

threatened by new technologies. As I discuss in Chapter 20, people who do

not understand the required system are often responsible for procurement

decisions. Engineering issues do not necessarily play a major part in their

decision-making process.

19.4 System development

System development is a complex process in which the elements that are

part of the

system are developed or purchased and then integrated to create the final

system.

The system requirements are the bridge between the conceptual design

and the

development processes. during conceptual design, business and high-level

func-

tional and non-functional system requirements are defined. You can think

of this as

the start of development, hence the overlapping processes shown in Figure

19.1.

Once contracts for the system elements have been agreed, more detailed

require-

ments engineering takes place.

Figure 19.10 is a model of the systems development process. Systems

engineer-

ing processes usually follow a “waterfall” process model similar to the one

that I

discussed in Chapter 2. Although the waterfall model is inappropriate for

most types of software development, higher-level systems engineering

processes are plan-driven

processes that still follow this model.

Plan-driven processes are used in systems engineering because different

elements

of the system are independently developed. different contractors are

working con-

currently on separate subsystems. Therefore, the interfaces to these

elements have to be designed before development begins. For systems that

include hardware and other

equipment, changes during development can be very expensive or,

sometimes, prac-

tically impossible. It is essential therefore, that the system requirements

are fully understood before hardware development or building work

begins.

One of the most confusing aspects of systems engineering is that

companies use

different terminology for each stage of the process. Sometimes,

requirements engi-

neering is part of the development process, and sometimes it is a separate

activity.

However, after conceptual design, there are seven fundamental

development activities: 1. Requirements engineering is the process of

refining, analyzing, and documenting the high-level and business

requirements identified in the conceptual design. I

have covered the most important requirements engineering activities in

Chapter 4.

2. Architectural design overlaps significantly with the requirements

engineering process. The process involves establishing the overall

architecture of the system,

19.4 System development 571

Requirements

System

engineering

deployment

Architectural

System

design

testing

Requirements

System

partitioning

integration

Subsystem

engineering

Figure 19.10 The

systems development

process

identifying the different system components, and understanding the

relation-

ships between them.

3. Requirements partitioning is concerned with deciding which subsystems

(identified in the system architecture) are responsible for implementing

the system

requirements. Requirements may have to be allocated to hardware,

software, or

operational processes and prioritized for implementation. Ideally, you

should

allocate requirements to individual subsystems so that the implementation

of a

critical requirement does not need subsystem collaboration. However, this

is not

always possible. At this stage you also decide on the operational processes

and

on how these are used in the requirements implementation.

4. Subsystem engineering involves developing the software components of

the system, configuring off-the-shelf hardware and software, designing, if

necessary,

special-purpose hardware, defining the operational processes for the

system,

and re-designing essential business processes.

5. System integration is the process of putting together system elements to

create a new system. Only then do the emergent system properties become

apparent.

6. System testing is an extended activity where the whole system is tested

and problems are exposed. The subsystem engineering and system

integration phases are reentered

to repair these problems, tune the performance of the system, and

implement new

requirements. System testing may involve both testing by the system

developer and

acceptance/user testing by the organization that has procured the system.

7. System deployment is the process of making the system available to its

users, transferring data from existing systems, and establishing

communications with

other systems in the environment. The process culminates with a “go live,”

after

which users start to use the system to support their work.

Although the overall process is plan-driven, the processes of requirements

devel-

opment and system design are inextricably linked. The requirements and

the high-level

572 Chapter 19 Systems engineering

Requirements

Domain and problem

elicitation and

understanding

analysis

Architectural

Start

design

Review and

assessment

Requirements

partitioning

Figure 19.11

System requirements and

Requirements and

design documentation

design spiral

design are developed concurrently. Constraints posed by existing systems

may limit

design choices, and these choices may be specified in the requirements.

You may

have to do some initial design to structure and organize the requirements

engineering process. As the design process continues, you may discover

problems with existing

requirements and new requirements may emerge. Consequently, you can

think of

these linked processes as a spiral, as shown in Figure 19.11.

The spiral reflects the reality that requirements affect design decisions and

vice

versa, and so it makes sense to interleave these processes. Starting in the

center, each round of the spiral may add detail to the requirements and

the design. As subsystems are identified in the architecture, decisions are

made on the responsibilities of these subsystems for providing the system

requirements. Some rounds of the spiral may

focus on requirements, others on design. Sometimes new knowledge

collected dur-

ing the requirements and design process means that the problem

statement itself has

to be changed.

For almost all systems, many possible designs meet the requirements.

These

cover a range of solutions that combine hardware, software, and human

operations.

The solution that you choose for further development may be the most

appropriate

technical solution that meets the requirements. However, wider

organizational and

political considerations may influence the choice of solution. For example,

a government client may prefer to use national rather than foreign

suppliers for its system, even if national products are technically inferior.

These influences usually take effect in the review and assessment phase of

the

spiral model where designs and requirements may be accepted or rejected.

The pro-

cess ends when a review decides that the requirements and high-level

design are

sufficiently detailed for subsystems to be specified and designed.

19.4 System development 573

Subsystem engineering involves designing and building the system’s

hardware and

software components. For some types of systems, such as spacecraft, all

hardware and software components may be designed and built during the

development process.

However, in most systems, some components are bought rather than

developed. It is

usually much cheaper to buy existing products than to develop special-

purpose compo-

nents. However, if you buy large off-the-shelf systems, such as ERP

systems, there is a significant cost in configuring these systems for use in

their operational environment.

Subsystems are usually developed in parallel. When problems that cut

across sub-

system boundaries are encountered, a system modification request must be

made.

Where systems involve extensive hardware engineering, making

modifications after

manufacturing has started is usually very expensive. Often “workarounds”

that com-

pensate for the problem must be found. These workarounds usually

involve software

changes to implement new requirements.

during systems integration, you take the independently developed

subsystems

and put them together to make up a complete system. This integration can

be

achieved using a “big bang” approach, where all the subsystems are

integrated at the same time. However, for technical and managerial

reasons, an incremental integration process where subsystems are

integrated one at a time is the best approach:

1. It is usually impossible to schedule the development of all the

subsystems so

that they are all finished at the same time.

2. Incremental integration reduces the cost of error location. If many

subsystems

are simultaneously integrated, an error that arises during testing may be

in any of

these subsystems. When a single subsystem is integrated with an already

work-

ing system, errors that occur are probably in the newly integrated

subsystem or

in the interactions between the existing subsystems and the new

subsystem.

As an increasing number of systems are built by integrating off-the-shelf

hardware

and software application systems, the distinction between implementation

and integration is becoming blurred. In some cases, there is no need to

develop new hardware or software. Essentially, systems integration is the

implementation phase of the system.

during and after the integration process, the system is tested. This testing

should

focus on testing the interfaces between components and the behavior of

the system as a whole. Inevitably, testing also reveals problems with

individual subsystems that have to be repaired. Testing takes a long time,

and a common problem in system development

is that the testing team may run out of either budget or time. This problem

can lead to the delivery of error-prone systems that need be repaired after

they have been deployed.

Subsystem faults that are a consequence of invalid assumptions about

other subsys-

tems are often exposed during system integration. This may lead to

disputes between

the contractors responsible for implementing different subsystems. When

problems

are discovered in subsystem interaction, the contractors may argue about

which sub-

system is faulty. Negotiations on how to solve the problems can take

weeks or months.

The final stage of the system development process is system delivery and

deploy-

ment. The software is installed on the hardware and is readied for

operation. This may

574 Chapter 19 Systems engineering

involve more system configuration to reflect the local environment where

it is used, the transfer of data from existing systems, and the preparation

of user documentation and training. At this stage, you may also have to

reconfigure other systems in the

environment to ensure that the new system interoperates with them.

Although system deployment is straightforward in principle, it is often

more diffi-

cult than anticipated. The user environment may be different from that

anticipated by the system developers. Adapting the system to make it

work in an unexpected environment can be difficult. The existing system

data may require extensive clean-up, and

parts of it may involve more effort than expected. The interfaces to other

systems may not be properly documented. You may find that the planned

operational processes

have to be changed because they are not compatible with the operational

processes for other systems. User training is often difficult to arrange, with

the consequence that, initially at least, users are unable to access the

capabilities of the system. System deployment can therefore take much

longer and cost much more than anticipated.

19.5 System operation and evolution

Operational processes are the processes that are involved in using the

system as

intended by its designers. For example, operators of an air traffic control

system

follow specific processes when aircraft enter and leave airspace, when

they have to

change height or speed, when an emergency occurs, and so on. For new

systems,

these operational processes have to be defined and documented during the

system

development process. Operators may have to be trained and other work

processes

adapted to make effective use of the new system. Undetected problems

may arise

at this stage because the system specification may contain errors or

omissions.

While the system may perform to specification, its functions may not meet

the real

operational needs. Consequently, the operators may not use the system as

its

designers intended.

Although the designers of operational processes may have based their

process

designs on extensive user studies, there is always a period of

“domestication”

(Stewart and Williams 2005) when users adapt to the new system and

work out

practical processes of how to use it. While user interface design is

important, studies have shown that, given time, users can adapt to

complex interfaces. As they become

experienced, they prefer ways of using the system quickly rather than

easily. This

means that when designing systems, you should not simply cater for

inexperienced

users but you should design the user interface to be adaptable for

experienced users.

Some people think that system operators are a source of problems in a

system and

that we should move toward automated systems where operator

involvement is min-

imized. In my opinion, there are two problems with this approach:

1. It is likely to increase the technical complexity of the system because it

has to be designed to cope with all anticipated failure modes. This

increases the costs and

19.5 System operation and evolution 575

time required to build the system. Provision also has to be made to bring

in peo-

ple to deal with unanticipated failures.

2. People are adaptable and can cope with problems and unexpected

situations.

Thus, you do not have to anticipate everything that could possibly go

wrong

when you are specifying and designing the system.

People have a unique capability of being able to respond effectively to the

unex-

pected, even when they have never had direct experience of these

unexpected events or system states. Therefore, when things go wrong, the

system operators can often recover the situation by finding workarounds

and using the system in nonstandard ways.

Operators also use their local knowledge to adapt and improve processes.

Normally, the actual operational processes are different from those

anticipated by the system designers.

Consequently, you should design operational processes to be flexible and

adapt-

able. The operational processes should not be too constraining; they

should not

require operations to be done in a particular order; and the system

software should

not rely on a specific process being followed. Operators usually improve

the process because they know what does and does not work in a real

situation.

A problem that may only emerge after the system goes into operation is

the oper-

ation of the new system alongside existing systems. There may be physical

problems

of incompatibility, or it may be difficult to transfer data from one system

to another.

More subtle problems might arise because different systems have different

user

interfaces. Introducing a new system may increase the operator error rate,

as the

operators use user interface commands for the wrong system.

19.5.1 System evolution

Large, complex systems usually have a long lifetime. Complex hardware/

software

systems may remain in use for more than 20 years, even though both the

original

hardware and software technologies used are obsolete. There are several

reasons for

this longevity, as shown in Figure 19.12.

Over their lifetime, large complex systems change and evolve to correct

errors in the original system requirements and to implement new

requirements that have emerged.

The system’s computers are likely to be replaced with new, faster

machines. The organization that uses the system may reorganize itself and

hence use the system in a different way. The external environment of the

system may change, forcing changes to the system. Hence, evolution is a

process that runs alongside normal system operational processes. System

evolution involves reentering the development process to make changes

and extensions to the system’s hardware, software, and operational

processes.

System evolution, like software evolution (discussed in Chapter 9), is

inherently

costly for several reasons:

1. Proposed changes have to be analyzed very carefully from a business

and a tech-

nical perspective. Changes have to contribute to the goals of the system

and

should not simply be technically motivated.

576 Chapter 19 Systems engineering

Factor

Rationale

Investment cost

The costs of a systems engineering project may be tens or even hundreds

of

millions of dollars. These costs can only be justified if the system can

deliver

value to an organization for many years.

Loss of expertise

As businesses change and restructure to focus on their core activities, they

often lose engineering expertise. This may mean that they lack the ability

to

specify the requirements for a new system.

Replacement cost

The cost of replacing a large system is very high. Replacing an existing

system can

be justified only if this leads to significant cost savings over the existing

system.

Return on investment

If a fixed budget is available for systems engineering, spending on new

systems in some other area of the business may lead to a higher return on

investment than replacing an existing system.

Risks of change

Systems are an inherent part of business operations, and the risks of

replacing existing systems with new systems cannot be justified. The

danger

with a new system is that things can go wrong in the hardware, software,

and

operational processes. The potential costs of these problems for the

business

may be so high that they cannot take the risk of system replacement.

System dependencies

Systems are interdependent and replacing one of these systems may lead

to

extensive changes in other systems.

Figure 19.12 Factors 2. Because subsystems are never completely

independent, changes to one subsystem that influence system

may have side-effects that adversely affect the performance or behavior of

other

lifetimes

subsystems. Consequent changes to these subsystems may therefore be

needed.

3. The reasons for original design decisions are often unrecorded. Those

responsible for the system evolution have to work out why particular

design decisions were made.

4. As systems age, their structure becomes corrupted by change, so the

costs of

making further changes increases.

Systems that have been in use for many years are often reliant on obsolete

hard-

ware and software technology. These “legacy systems” (discussed in

Chapter 9) are

sociotechnical computer-based systems that have been developed using

technology

that is now obsolete. However, they don’t just include legacy hardware

and software.

They also rely on legacy processes and procedures—old ways of doing

things that

are difficult to change because they rely on legacy software. Changes to

one part of the system inevitably involve changes to other components.

Changes made to a system during system evolution are often a source of

problems

and vulnerabilities. If the people implementing the changes are different

from those who developed the system, they may be unaware that a design

decision was taken for

dependability and security reasons. Therefore, they may change the

system and lose

some safeguards that were deliberately implemented when the system was

built.

Furthermore, as testing is so expensive, complete retesting may be

impossible after

every system change. Consequently, testing may not discover the adverse

side-

effects of changes that introduce or expose faults in other system

components.

Chapter 19 Further reading 577

K e y P o i n t s

Systems engineering is concerned with all aspects of specifying, buying,

designing, and testing complex sociotechnical systems.

Sociotechnical systems include computer hardware, software, and

people, and are situated within an organization. They are designed to

support organizational or business goals and objectives.

The emergent properties of a system are characteristics of the system as

a whole rather than of its component parts. They include properties such

as performance, reliability, usability, safety, and security.

The fundamental systems engineering processes are conceptual systems

design, system procurement, system development, and system operation.

Conceptual systems design is a key activity where high-level system

requirements and a vision of the operational system is developed.

System procurement covers all of the activities involved in deciding

what system to buy and who should supply that system. Different

procurement processes are used for off-the-shelf application systems,

configurable COTS systems, and custom systems.

System development processes include requirements specification,

design, construction, integration, and testing.

When a system is put into use, the operational processes and the system

itself inevitably change to reflect changes to the business requirements

and the system’s environment.

F u r t h e r r e a d i n g

“Airport 95: Automated Baggage System.” An excellent, readable case

study of what can go wrong with a systems engineering project and how

software tends to get the blame for wider systems failures. ( ACM Software

Engineering Notes, 21, March 1996). http://

doi.acm.org/10.1145/227531.227544

“Fundamentals of Systems Engineering.” This is the introductory chapter

in NASA’s systems engineering handbook. It presents an overview of the

systems engineering process for space systems.

Although these are mostly technical systems, there are sociotechnical

issues to be considered.

Dependability is obviously critically important. (In NASA Systems

Engineering Handbook, NASA-SP

2007-6105, 2007). http://ntrs.nasa.gov/archive/nasa/

casi.ntrs.nasa.gov/20080008301_2008008500.pdf

The LSCITS Socio-technical Systems Handbook. This handbook introduces

sociotechnical systems in an accessible way and provides access to more

detailed papers on sociotechnical topics. (Various

authors, 2012). http://archive.cs.st-andrews.ac.uk/STSE-Handbook

Architecting systems: Concepts, Principles and Practice. This is a refreshingly

different book on systems engineering that does not have the hardware

focus of many “traditional” systems engineering books.

578 Chapter 19 Systems engineering

The author, who is an experienced systems engineer, draws on examples

from a wide range of systems and recognizes the importance of

sociotechnical as well as technical issues. (H. Sillitto, College Publications,

2014).

W e b S i t e

PowerPoint slides for this chapter:

www.pearsonglobaleditions.com/Sommerville

Links to supporting videos:

http://software-engineering-book.com/videos/systems-engineering/

e x e r C i S e S

19.1. Give two examples of government functions that are supported by

complex sociotechnical systems and explain why, in the foreseeable

future, these functions cannot be completely automated.

19.2. Explain briefly why the involvement of a range of professional

disciplines is essential in systems engineering.

19.3. Complex sociotechnical systems lead to three important

characteristics. What are they?

Explain each in brief.

19.4. What is a “wicked problem”? Explain why the development of a

national medical records system should be considered a “wicked

problem.”

19.5. A multimedia virtual museum system offering virtual experiences of

ancient Greece is to be developed for a consortium of European museums.

The system should provide users with the facility to view 3-D models of

ancient Greece through a standard web browser and should also support

an immersive virtual reality experience. Develop a conceptual design for

such a system, highlighting its key characteristics and essential high-level

requirements.

19.6. Explain why you need to be flexible and adapt system requirements

when procuring large off-the-shelf software systems, such as ERP systems.

Search the web for discussions of the failures of such systems and explain,

from a sociotechnical perspective, why these failures

occurred. A possible starting point is: http://blog.360cloudsolutions.com/

blog/bid/94028/

Top-Six-ERP-Implementation-Failures

19.7. Why is system integration a particularly critical part of the systems

development process?

Suggest three sociotechnical issues that may cause difficulties in the

system integration process.

19.8. Why is system evolution inherently costly?

Chapter 19 References 579

19.9. What are the arguments for and against considering system

engineering as a profession in its own right, like electrical engineering or

software engineering?

19.10. You are an engineer involved in the development of a financial

system. During installation, you discover that this system will make a

significant number of people redundant. The people in the environment

deny you access to essential information to complete the system

installation. To what extent should you, as a systems engineer, become

involved in this situation? Is it your professional responsibility to complete

the installation as contracted? Should you simply abandon the work until

the procuring organization has sorted out the problem?

r e F e r e n C e S

Baxter, G., and I. Sommerville. 2011. “Socio-Technical Systems: From

Design Methods to Systems Engineering.” Interacting with Computers 23 (1):

4–17. doi:10.1016/j.intcom.2010.07.003.

Checkland, P. 1981. Systems Thinking, Systems Practice. Chichester, UK:

John Wiley & Sons.

Fairley, R. E., R. H. Thayer, and P. Bjorke. 1994. “The Concept of

Operations: The Bridge from Operational Requirements to Technical

Specifications.” In 1st Int. Conf. on Requirements Engineering, 40–7.

Colorado Springs, CO. doi:10.1109/ICRE.1994.292405.

IEEE. 2007. “IEEE Guide for Information Technology. System Definition—

Concept of Operations (ConOps) Document.” Electronics. Vol. 1998.

doi:10.1109/IEEESTD.1998.89424. http://ieeexplore.

ieee.org/servlet/opac?punumber=6166

Mostashari, A., S. A. McComb, D. M. Kennedy, R. Cloutier, and P.

Korfiatis. 2012. “Developing a Stakeholder-Assisted Agile CONOPS

Development Process.” Systems Engineering 15 (1): 1–13.

doi:10.1002/sys.20190.

Rittel, H., and M. Webber. 1973. “Dilemmas in a General Theory of

Planning.” Policy Sciences 4: 155–169. doi:10.1007/BF01405730.

Stevens, R., P. Brook, K. Jackson, and S. Arnold. 1998. Systems Engineering:

Coping with Complexity. London: Prentice-Hall.

Stewart, J., and R. Williams. 2005. “The Wrong Trousers? Beyond the

Design Fallacy: Social Learning and the User.” In User Involvement in

Innovation Processes. Strategies and Limitations from a Socio-Technical

Perspective, edited by H. Rohrache, 39–71. Berlin: Profil-Verlag.

Thayer, R. H. 2002. “Software System Engineering: A Tutorial.” IEEE

Computer 35 (4): 68–73.

doi:10.1109/MC.2002.993773.

White, S., M. Alford, J. Holtzman, S. Kuehl, B. McCay, D. Oliver, D.

Owens, C. Tully, and A. Willey.

1993. “Systems Engineering of Computer-Based Systems.” IEEE Computer

26 (11): 54–65.

doi:10.1109/ECBS.1994.331687.

20

Systems of systems

Objectives

The objectives of this chapter are to introduce the idea of a system of

systems and to discuss the challenges of building complex systems of

software systems. When you have read this chapter, you will:

understand what is meant by a system of systems and how it

differs from an individual system;

understand systems of systems classification and the differences

between different types of systems of systems;

understand why conventional methods of software engineering

that are based on reductionism are inadequate for developing

systems of systems;

have been introduced to the systems of systems engineering

process and architectural patterns for systems of systems.

Contents

20.1 System complexity

20.2 Systems of systems classification

20.3 Reductionism and complex systems

20.4 Systems of systems engineering

20.5 Systems of systems architecture

Chapter 20 Systems of systems 581

We need software engineering because we create large and complex

software

systems. The discipline emerged in the 1960s because the first attempts to

build

large software systems mostly went wrong. Creating software was much

more

expensive than expected, took longer than planned, and the software itself

was often unreliable. To address these problems, we have developed a

range of software engineering techniques and technologies, which have

been remarkably successful. We

can now build systems that are much larger, more complex, much more

reliable, and

more effective than the software systems of the 1970s.

However, we have not “solved” the problems of large system engineering.

Software project failures are still common. For example, there have been

serious

problems and delays in the implementation of government health care

systems in

both the United States and the UK. The root cause of these problems is, as

it was in the 1960s, that we are trying to build systems that are larger and

more complex than before. We are attempting to build these “mega-

systems” using methods and technology that were never designed for this

purpose. As I discuss later in the chapter, I believe that current software

engineering technology cannot scale up to cope with

the complexity that is inherent in many of the systems now being

proposed.

The increase in size of software systems since the introduction of software

engi-

neering has been remarkable. Today’s large systems may be a hundred or

even a

thousand times larger than the “large” systems of the 1960s. Northrop and

her col-

leagues (Northrop et al. 2006) suggested in 2006 that we would shortly

see the

development of systems with a billion lines of code. Almost 10 years after

this pre-

diction, I suspect such systems are already in use.

Of course, we do not start with nothing and then write a billion lines of

code. As

I discussed in Chapter 15, the real success story of software engineering

has been

software reuse. It is only because we have developed ways of reusing

software

across applications and systems that large-scale development is possible.

Very largescale systems now and in the future will be built by integrating

existing systems

from different providers to create systems of systems (SoS).

What do we mean when we talk about a system of systems? As Hitchens

says

(Hitchins 2009), from a general systems perspective, there is no difference

between

a system and a system of systems. Both have emergent properties and can

be com-

posed from subsystems. However, from a software engineering

perspective, I think

there is a useful distinction between these terms. This distinction is

sociotechnical rather than technical:

A system of systems is a system that contains two or more independently

managed elements.

This means that there is no single manager for all of the parts of the

system of

systems and that different parts of a system are subject to different

management and control policies and rules. As we shall see, distributed

management and control has

a profound effect on the overall complexity of the system.

This definition of systems of systems says nothing about the size of

systems of

systems. A relatively small system that includes services from different

providers is

582 Chapter 20 Systems of systems

a system of systems. Some of the problems of SoS engineering apply to

such small

systems, but the real challenges emerge when the constituent systems are

themselves

large-scale systems.

Much of the work in the area of systems of systems has come from the

defense

community. As the capability of software systems increased in the late

20th century, it became possible to coordinate and control previously

independent military

systems, such as naval and ground-based air and ship defense systems. The

system

might include tens or hundreds of separate elements, with software

systems keeping

track of these elements and providing controllers with information that

allows them

to be deployed most effectively.

This type of system of systems is outside the scope of a software

engineering

book. Instead, I focus here on systems of systems where the system

elements are

software systems rather than hardware such as aircraft, military vehicles,

or radars.

Systems of software systems are created by integrating separate software

systems,

and, at the time of writing, most software SoS include a relatively small

number of

separate systems. Each constituent system is usually a complex system in

its own

right. However, it is predicted that, over the next few years, the size of

software SoS

is likely to grow significantly as more and more systems are integrated to

make use

of the capabilities that they offer.

Examples of systems of systems of software systems are:

1. A cloud management system that handles local private cloud

management and

management of servers on public clouds such as Amazon and Microsoft.

2. An online banking system that handles loan requests and that connects

to a

credit reference system provided by credit reference agencies to check the

credit

of applicants.

3. An emergency information system that integrates information from

police,

ambulance, fire, and coast guard services about the assets available to deal

with

civil emergencies such as flooding and large-scale accidents.

4. The digital learning environment (iLearn) that I introduced in Chapter

1.

This system provides a range of learning support by integrating separate

software systems such as Microsoft Office 365, virtual learning environ-

ments such as Moodle, simulation modeling tools, and content such as

newspaper archives.

Maier (Maier 1998) identified five essential characteristics of systems of

systems:

1. Operational independence of elements Parts of the system are not simply

components but can operate as useful systems in their own right. The

systems within

the SoS evolve independently.

2. Managerial independence of elements Parts of the system are “owned” and

managed by different organizations or by different parts of a larger

organization.

Therefore different rules and policies apply to the management and

evolution of

Chapter 20 Systems of systems 583

these systems. As I have suggested, this is the key factor that distinguishes

a

system of systems from a system.

3. Evolutionary development SoS are not developed in a single project but

evolve over time from their constituent systems.

4. Emergence SoS have emergent characteristics that only become apparent

after the SoS has been created. Of course, as I have discussed in Chapter

19, emergence is a characteristic of all systems, but it is particularly

important in SoS.

5. Geographical distribution of elements The elements of a SoS are often

geographically distributed across different organizations. This is important

technically

because it means that an externally-managed network is an integral part of

the

SoS. It is also important managerially as it increases the difficulties of

communi-

cation between those involved in making system management decisions

and adds

to the difficulties of maintaining system security.†

I would like to add two further characteristics to Maier’s list that are

particularly relevant to systems of software systems:

1. Data intensive A software SoS typically relies on and manages a very

large volume of data. In terms of size, this may be tens or even hundreds

of times

larger than the code of the constituent systems itself.

2. Heterogeneity The different systems in a software SoS are unlikely to

have been developed using the same programming languages and design

methods. This is

a consequence of the very rapid pace of evolution of software

technologies.

Companies frequently update their development methods and tools as

new,

improved versions become available. In a 20-year lifetime of a large SoS,

tech-

nologies may change four or five times.

As I discuss in Section 20.1, these characteristics mean that SoS can be

much

more complex than systems with a single owner and manager. I believe

that our cur-

rent software engineering methods and techniques cannot scale to cope

with this

complexity. Consequently, problems with the very large and complex

systems that

we are now developing are inevitable. We need a completely new set of

abstractions,

methods, and technologies for software systems of systems engineering.

This need has been recognized independently by a number of different

authori-

ties. In the UK, a report published in 2004 (Royal Academy of Engineering

2004)

led to the establishment of a national research and training initiative in

large-scale complex IT systems (Sommerville et al. 2012). In the United

States, the Software

Engineering Institute reported on Ultra-Large Scale Systems in 2006

(Northrop et al.

2006). From the systems engineering community, Stevens (Stevens 2010)

discusses

the problems of constructing “mega-systems” in transport, health care, and

defense.

†Maier, M. W. 1998. “Architecting Principles for Systems-of-Systems.”

Systems Engineering 1 (4): 267–284. doi:10.1002/

(SICI)1520-6858(1998)1:4<267::AID-SYS3>3.0.CO;2-D.

584 Chapter 20 Systems of systems

Figure 20.1 Simple

and complex systems

System (a)

System (b)

20.1 System complexity

I suggested in the introduction that the engineering problems that arise

when con-

structing systems of software systems are due to the inherent complexity

of these

systems. In this section, I explain the basis of system complexity and

discuss the

different types of complexity that arise in software SoS.

All systems are composed of parts (elements) with relationships between

these

elements of the system. For example, the parts of a program may be

objects, and the

parts of each object may be constants, variables, and methods. Examples

of relation-

ships include “calls” (method A calls method B), “inherits-from” (object X

inherits

the methods and attributes of object Y), and “part of ” (method A is part of

object X).

The complexity of any system depends on the number and types of

relationships

between system elements. Figure 20.1 shows examples of two systems.

System (a) is

a relatively simple system with only a small number of relationships

between its elements. By contrast, System (b), with the same number of

elements, is a more com-

plex system because it has many more element–element relationships.

The type of relationship also influences the overall complexity of a system.

Static

relationships are relationships that are planned and analyzable from static

depictions of the system. Therefore, the “uses” relationship in a software

system is a static relationship. From either the software source code or a

UML model of a system, you can

work out how any one software component uses other components.

Dynamic relationships are relationships that exist in an executing system.

The

“calls” relationship is a dynamic relationship because, in any system with

if-statements, you cannot tell whether or not one method will call another

method. It depends on

the runtime inputs to the system. Dynamic relationships are more complex

to analyze

as you need to know the system inputs and data used as well as the source

code of

the system.

As well as system complexity, we also have to consider the complexity of

the

processes used to develop and maintain the system once it has gone into

use. Figure 20.2

illustrates these processes and their relationship with the developed

system.

20.1 System complexity 585

Production process

Management process

Produces

Manages

Figure 20.2 Production

and management

Complex system

processes

As systems grow in size, they need more complex production and

management pro-

cesses. Complex processes are themselves complex systems. They are

difficult to understand and may have undesirable emergent properties.

They are more time consuming

than simpler processes, and they require more documentation and

coordination between the people and the organizations involved in the

system development. The complexity of the production process is one of

the main reasons why projects go wrong, with software delivered late and

overbudget. Therefore, large systems are always at risk of cost and time

overruns.

Complexity is important for software engineering because it is the main

influence

on the understandability and the changeability of a system. The more

complex a sys-

tem, the more difficult it is to understand and analyze. Given that

complexity is a function of the number of relationships between elements

of a system, it is inevitable that large systems are more complex than

small systems. As complexity increases, there

are more and more relationships between elements of the system and an

increased

likelihood that changing one part of a system will have undesirable effects

elsewhere.

Several different types of complexity are relevant to sociotechnical

systems:

1. The

technical complexity of the system is derived from the relationships

between the different components of the system itself.

2. The

managerial complexity of the system is derived from the complexity of the

relationships between the system and its managers (i.e., what can

managers

change in the system) and the relationships between the managers of

different

parts of the system.

586 Chapter 20 Systems of systems

3. The

governance complexity of a system depends on the relationships between

the laws, regulations, and policies that affect the system and the

relationships

between the decision-making processes in the organizations responsible

for the

system. As different parts of the system may be in different organizations

and in

different countries, different laws, rules, and policies may apply to each

system

within the SoS.

Governance and managerial complexity are related, but they are not the

same

thing. Managerial complexity is an operational issue—what can and can’t

actually be

done with the system. Governance complexity is associated with the

higher level of

decision-making processes in organizations that affect the system. These

decision-

making processes are constrained by national and international laws and

regulations.

For example, say a company decides to allow its staff to access its systems

using

their own mobile devices rather than company-issued laptops. The

decision to allow

this is a governance decision because it changes the policy of the

company. As a

result of this decision, management of the system becomes more complex

as manag-

ers have to ensure that the mobile devices are configured properly so that

company

data is secure. The technical complexity of the system also increases as

there is no longer a single implementation platform. Software may have to

be modified to work

on laptops, tablets and phones.

As well as technical complexity, the characteristics of systems of systems

may also

lead to significantly increased managerial and governance complexity.

Figure 20.3

summarizes how the different SoS characteristics primarily contribute to

different

types of complexity:

1. Operational independence The constituent systems in the SoS are subject

to different policies and rules (governance complexity) and ways of

managing the

system (managerial complexity).

2. Managerial independence The constituent systems in the SoS are

managed by different people in different ways. They have to coordinate to

ensure that management changes are consistent (managerial complexity).

Special software may be

needed to support consistent management and evolution (technical

complexity).

3. Evolutionary development contributes to the technical complexity of a

SoS because different parts of the system are likely to be built using

different technologies.

4. Emergence is a consequence of complexity. The more complex a system,

the

more likely it is that it will have undesirable emergent properties. These

proper-

ties increase the technical complexity of the system as software has to be

devel-

oped or changed to compensate for them.

5. Geographical distribution increases the technical, managerial, and

governance

complexity in a SoS. Technical complexity is increased because software is

required to coordinate and synchronize remote systems; managerial

complexity

is increased because it is more difficult for managers in different countries

to

coordinate their actions; governance complexity is increased because

different

20.2 Systems of systems classification 587

Technical

Managerial

Governance

SoS characteristic

complexity

complexity

complexity

Operational independence

X

X

Managerial independence

X

X

Evolutionary development

X

Emergence

X

Geographical distribution

X

X

X

Data-intensive

X

X

Figure 20.3 SoS

characteristics and

Heterogeneity

X

system complexity

parts of the systems may be located in different jurisdictions and so are

subject

to different laws and regulations.

6. Data-intensive systems are technically complex because of the

relationships

between the data items. The technical complexity is also likely to be

increased

to cope with data errors and incompleteness. Governance complexity may

be

increased because of different laws governing the use of data.

7. The heterogeneity of a system contributes to its technical complexity

because of the difficulties of ensuring that different technologies used in

different parts of

the system are compatible.

Large-scale systems of systems are now unimaginably complex entities

that can-

not be understood or analyzed as a whole. As I discuss in Section 20.3, the

large

number of interactions between the parts and the dynamic nature of these

interac-

tions means that conventional engineering approaches do not work well

for complex

systems. It is complexity that is the root cause of problems in projects to

develop

large software-intensive systems, not poor management or technical

failings.

20.2 Systems of systems classification

Earlier, I suggested that the distinguishing feature of a system of systems

was that two or more of its elements were independently managed.

Different people with

different priorities have the authority to take day-to-day operational

decisions about changes to the system. As their work is not necessarily

aligned, conflicts can arise that require a significant amount of time and

effort to resolve. Systems of systems, therefore, always have some degree

of managerial complexity.

However, this broad definition of SoS covers a very wide range of system

types. It

includes systems that are owned by a single organization but are managed

by different

588 Chapter 20 Systems of systems

parts of that organization. It also includes systems whose constituent

systems are

owned and managed by different organizations that may, at times,

compete with

each other. Maier (Maier 1998) devised a classification scheme for SoS

based on

their governance and management complexity:

1. Directed systems. Directed SoS are owned by a single organization and

are developed by integrating systems that are also owned by that

organization. The

system elements may be independently managed by parts of the

organization.

However, there is an ultimate governing body within the organization that

can

set priorities for system management. It can resolve disputes between the

man-

agers of different elements of the system. Directed systems therefore have

some

managerial complexity but no governance complexity. A military

command-

and-control system that integrates information from airborne and ground-

based

systems is an example of a directed SoS.

2. Collaborative systems. Collaborative SoS are systems with no central

authority to set management priorities and resolve disputes. Typically,

elements of the

system are owned and governed by different organizations. However, all

of the

organizations involved recognize the mutual benefits of joint governance

of

the system. They therefore usually set up a voluntary governance body

that

makes decisions about the system. Collaborative systems have both

manage-

rial complexity and a limited degree of governance complexity. An

integrated

public transport information system is an example of a collaborative

system of

systems. Bus, rail, and air transport providers agree to link their systems to

provide passengers with up-to-date information.

3. Virtual systems. Virtual systems have no central governance, and the

participants may not agree on the overall purpose of the system.

Participant systems

may enter or leave the SoS. Interoperability is not guaranteed but depends

on

published interfaces that may change. These systems have a very high

degree of

both managerial and governance complexity. An example of a virtual SoS

is an

automated high-speed algorithmic trading system. These systems from

different

companies automatically buy and sell stock from each other, with trades

taking

place in fractions of a second.

Unfortunately, I think that the names that Maier has used do not really

reflect the

distinctions between these different types of systems. As Maier himself

says, there is always some collaboration in the management of the system

elements. So, “collaborative systems” is not really a good name. The term

directed systems implies top-down authority. However, even within a single

organization, the need to maintain good

working relationships between the people involved means that governance

is agreed

to rather than imposed.

In “virtual” SoS, there may be no formal mechanisms for collaboration,

but the

system has some mutual benefit for all participants. Therefore, they are

likely to collaborate informally to ensure that the system can continue to

operate. Furthermore,

Maier’s use of the term virtual could be confusing because “virtual” has

now come to mean “implemented by software,” as in virtual machines and

virtual reality.

20.2 Systems of systems classification 589

Organizational

Federated

Coalition

1 2

3

Governance

Management

1

2

3

1

2

3

1

2

3

Technical

1

2

3

1

2

3

1

2

3

Figure 20.4 SoS

collaboration

Figure 20.4 illustrates the collaboration in these different types of system.

Rather than use Maier’s names, I have used what I hope are more

descriptive terms:

1. Organizational systems of systems are SoS where the governance and

management of the system lies within the same organization or company.

These corre-

spond to Maier’s “directed SoS.” Collaboration between system owners is

managed by the organization. The SoS may be geographically distributed,

with

different parts of the system subject to different national laws and

regulations.

In Figure 20.4, Systems 1, 2, and 3 are independently managed, but the

govern-

ance of these systems is centralized.

2. Federated systems are SoS where the governance of the SoS depends on a

voluntary participative body in which all of the system owners are

represented. In

Figure 20.4, this is shown by the owners of Systems 1, 2, and 3

participating in

a single governance body. The system owners agree to collaborate and

believe

that decisions made by the governance body are binding. They implement

these

decisions in their individual management policies, although

implementations

may differ because of national laws, regulations, and culture.

3. System of system coalitions are SoS with no formal governance

mechanisms but where the organizations involved informally collaborate

and manage their

own systems to maintain the system as a whole. For example, if one

system

provides a data feed to others, the managers of that system will not

change the

format of the data without notice. Figure 20.4 shows that there is no

govern-

ance at the organizational level but that informal collaboration exists at

the

management level.

This governance-based classification scheme provides a means of

identifying the

governance requirements for a SoS. By classifying a system according to

this model,

you can check if the appropriate governance structures exist and if these

are the ones you really need. Setting up these structures across

organizations is a political process and inevitably takes a long time. It is

therefore helpful to understand the governance

590 Chapter 20 Systems of systems

problem early in the process and take actions to ensure that appropriate

governance

is in place. It may be the case that you need to adopt a governance model

that moves a system from one class to another. Moving the governance

model to the left in

Figure 20.4 usually reduces complexity.

As I have suggested, the school digital learning environment (iLearn) is a

system

of systems. As well as the digital learning system itself, it is connected to

school administration systems and to network management systems. These

network management systems are used for Internet filtering, which stops

students from accessing undesirable material on the Internet.

iLearn is a relatively simple technical system, but it has a high level of

govern-

ance complexity. This complexity arises because of the way that education

is funded

and managed. In many countries pre-university education is funded and

organized at

a local level rather than at a national level. States, cities, or counties are

responsible for schools in their area and have autonomy in deciding school

funding and policies.

Each local authority maintains its own school administration system and

network

management system.

In Scotland, there are 32 local authorities with responsibility for education

in

their area. School administration is outsourced to one of three providers

and iLearn must connect to their systems. However, each local authority

has its own network

management policies with separate network management systems

involved.

The development of a digital learning system is a national initiative, but to

cre-

ate a digital learning environment, it has to be integrated with network

manage-

ment and school administration systems. It is therefore a system of

systems with

administration and network management systems, as well as the systems

within

iLearn such as Office 365 and Wordpress. There is no common governance

pro-

cess across authorities, so, according to the classification scheme, this is a

coalition of systems. In practice, this means that it cannot be guaranteed

that students

in different places can access the same tools and content, because of

different

Internet filtering policies.

When we produced the conceptual model for the system, we made a

strong rec-

ommendation that common policies should be established across local

authorities on

administrative information provision and Internet filtering. In essence, we

suggested that the system should be a federated system rather than a

coalition of systems. This suggestion requires a new governance body to

be established to agree on common

policies and standards for the system.

20.3 Reductionism and complex systems

I have already suggested that our current software engineering methods

and tech-

nologies cannot cope with the complexity that is inherent in modern

systems of sys-

tems. Of course, this idea is not new: Progress in all engineering

disciplines has

always been driven by challenging and difficult problems. New methods

and tools

are developed in response to failures and difficulties with existing

approaches.

20.3 Reductionism and complex systems 591

In software engineering, we have seen the incredibly rapid development of

the

discipline to help manage the increasing size and complexity of software

systems.

This effort has been very successful indeed. We can now build systems

that are

orders of magnitude larger and more complex than those of the 1960s and

1970s.

As with other engineering disciplines, the approach that has been the basis

of complexity management in software engineering is called reductionism.

Reductionism is a philosophical position based on the assumptions that

any system

is made up of parts or subsystems. It assumes that the behavior and

properties of the system as a whole can be understood and predicted by

understanding the individual

parts and the relationships between these parts. Therefore, to design a

system, the

parts making up that system are identified, constructed separately, and

then assem-

bled into the complete system. Systems can be thought of as hierarchies,

with the

important relationships between parent and child nodes in the hierarchy.

Reductionism has been and continues to be the fundamental underpinning

approach to all kinds of engineering. We can identify common

abstractions

across the same types of system and design and build these separately.

They can

then be integrated to create the required system. For example, the

abstractions in

an automobile might be a body shell, a drive train, an engine, a fuel

system, and

so on. There are a relatively small number of relationships between these

abstrac-

tions, so it is possible to specify interfaces and design and build each part

of the system separately.

The same reductionist approach has been the basis of software engineering

for

almost 50 years. Top-down design, where you start with a very high-level

model of

a system and break this down to its components is a reductionist

approach. This is

the basis of all software design methods, such as object-oriented design.

Programming languages include abstractions, such as procedures and

objects that directly reflect reductionist system decomposition.

Agile methods, although they may appear quite different from top-down

systems

design, are also reductionist. They rely on being able to decompose a

system into

parts, implement these parts separately, and then integrate these to create

the system.

The only real difference between agile methods and top-down design is

that the sys-

tem is decomposed into components incrementally rather than all at once.

Reductionist methods are most successful when there are relatively few

rela-

tionships or interactions between the parts of a system and it is possible to

model

these relationships in a scientific way. This is generally true for

mechanical and

electrical systems where there are physical linkages between the system

compo-

nents. It is less true for electronic systems and certainly not the case for

software systems, where there may be many more static and dynamic

relationships between

system components.

The distinctions between software and hardware components was

recognized in

the 1970s. Design methods emphasized the importance of limiting and

controlling

the relationships between the parts of a system. These methods suggested

that com-

ponents should be tightly integrated with loose coupling between these

components.

Tight integration meant that most of the relationships were internal to a

component, and loose coupling meant that there were relatively few

component–component

592 Chapter 20 Systems of systems

Control

Decision making

Problem definition

Owners of a

Decisions are made

There is a definable

system control

rationally, driven

problem and clear

its development

by technical criteria

system boundaries

Reductionist assumptions

There is no single

Decision making

Wicked problem with

Figure 20.5

system owner

driven by political

constantly renegotiated

Reductionist

or controller

motives

system boundaries

assumptions

and complex

system reality

Systems of systems reality

relationships. The need for tight integration (data and operations) and

loose cou-

pling was the driver for the development of object-oriented software

engineering.

Unfortunately, controlling the number and types of relationship is

practically

impossible in large systems, especially systems of systems. Reductionism

does not

work well when there are many relationships in a system and when these

relation-

ships are difficult to understand and analyze. Therefore, any type of large

system

development is likely to run into difficulties.

The reasons for these potential difficulties are that the fundamental

assumptions

inherent to reductionism are inapplicable for large and complex systems

(Sommerville et al. 2012). These assumptions are shown in Figure 20.5

and apply in three areas:

1. System ownership and control Reductionism assumes that there is a

controlling authority for a system that can resolve disputes and make

high-level technical

decisions that will apply across the system. As we have seen, because

there are

multiple bodies involved in their governance, this is simply not true for

systems

of systems.

2. Rational decision making Reductionism assumes that interactions

between components can be objectively assessed by, for example,

mathematical model-

ing. These assessments are the driver for system decision making.

Therefore, if

one particular design of a vehicle, say, offers the best fuel economy

without a

reduction in power, then a reductionist approach assumes that this will be

the

design chosen.

3. Defined system boundaries Reductionism assumes that the boundaries of

a system can be agreed to and defined. This is often straightforward: There

may be a

physical shell defining the system as in a car, a bridge has to cross a given

stretch of water, and so on. Complex systems are often developed to

address

wicked problems (Rittel and Webber 1973). For such problems, deciding

on

what is part of the system and what is outside it is usually a subjective

judgment,

with frequent disagreements between the stakeholders involved.

20.4 Systems of systems engineering 593

These reductionist assumptions break down for all complex systems, but

when

these systems are software-intensive, the difficulties are compounded:

1. Relationships in software systems are not governed by physical laws.

We cannot

produce mathematical models of software systems that will predict their

behavior and attributes. We therefore have no scientific basis for decision

making. Political factors are usually the driver of decision making for

large and complex software systems.

2. Software has no physical limitations; hence there are no limits on where

the

boundaries of a system should be drawn. Different stakeholders will argue

for the

boundaries to be placed in such a way that is best for them. Furthermore,

it is

much easier to change software requirements than hardware

requirements. The

boundaries and the scope of a system are likely to change during its

development.

3. Linking software systems from different owners is relatively easy; hence

we are more likely to try and create a SoS where there is no single

governing body. The management and evolution of the different systems

involved cannot be completely controlled.

For these reasons, I believe that the problems and difficulties that are

commonplace in large software systems engineering are inevitable.

Failures of large government

projects such as the health automation projects in the UK and the United

States are a consequence of complexity rather than technical or project

management failures.

Reductionist approaches such as object-oriented development have been

very suc-

cessful in improving our ability to engineer many types of software

system. They will continue to be useful and effective in developing small

and medium-sized systems

whose complexity can be controlled and which may be parts of a software

SoS.

However, because of the fundamental assumptions underlying

reductionism, “improv-

ing” these methods will not lead to an improvement in our ability to

engineer complex systems of systems. Rather, we need new abstractions,

methods, and tools that recognize the technical, human, social, and

political complexities of SoS engineering. I

believe that these new methods will be probabilistic and statistical and

that tools will rely on system simulation to support decision making.

Developing these new approaches is a major challenge for software and

systems engineering in the 21st century.

20.4 Systems of systems engineering

Systems of systems engineering is the process of integrating existing

systems to create new functionality and capabilities. Systems of systems

are not designed in a top-down way. Rather, they are created when an

organization recognizes that they can add value to existing systems by

integrating these into a SoS. For example, a city government

might wish to reduce air pollution at particular hot-spots in the city. To do

so, it might integrate its traffic management system with a national real-

time pollution monitoring systems. This then allows for the traffic

management system to alter its strategy to reduce pollution by changing

traffic light sequences, speed limits and so on.

594 Chapter 20 Systems of systems

Systems

knowledge

System

selection

Conceptual

Interface

Integration and

design

development

deployment

Architectural

design

Governance and management policy setting

Figure 20.6 An SoS

engineering process

The problems of software SoS engineering have much in common with the

prob-

lems of integrating large-scale application systems that I discussed in

Chapter 15

(Boehm and Abts 1999). To recap, these were:

1. Lack of control over system functionality and performance.

2. Differing and incompatible assumptions made by the developers of the

different

systems.

3. Different evolution strategies and timetables for the different systems.

4. Lack of support from system owners when problems arise.

Much of the effort in building systems of software systems comes from

address-

ing these problems. It involves deciding on the system architecture,

developing software interfaces that reconcile differences between the

participating systems, and

making the system resilient to unforeseen changes that may occur.

Software systems of systems are large and complex entities, and the

processes

used for their development vary widely depending on the type of systems

involved,

the application domain, and the needs of the organizations involved in

developing

the SoS. However, as shown in Figure 20.6, five general activities are

involved in

SoS development processes:

1. Conceptual design I introduced the idea of conceptual design in Chapter

19, which covers systems engineering. Conceptual design is the activity of

creating a high-level vision for a system, defining essential requirements,

and identifying constraints on the overall system. In SoS engineering, an

important input to the conceptual

design process is knowledge of the existing systems that may participate in

the SoS.

2. System selection During this activity, a set of systems for inclusion in the

SoS

is chosen. This process is comparable to the process of choosing

application

20.4 Systems of systems engineering 595

systems for reuse, covered in Chapter 15. You need to assess and evaluate

exist-

ing systems to choose the capabilities that you need. When you are

selecting

application systems, the selection criteria are largely commercial; that is,

which

systems offer the most suitable functionality at a price you are prepared to

pay?

However, political imperatives and issues of system governance and

management

are often the key factors that influence what systems are included in a

SoS. For

example, some systems may be excluded from consideration because an

organiza-

tion does not wish to collaborate with a competitor. In other cases,

organizations

that are contributing to a federation of systems may have systems in place

and

insist that these are used, even though they are not necessarily the best

systems.

3. Architectural design In parallel with system selection, an overall

architecture for the SoS has to be developed. Architectural design is a

major topic in its own

right that I cover in Section 20.5.

4. Interface development The different systems involved in a SoS usually

have incompatible interfaces. Therefore, a major part of the software

engineering

effort in developing a SoS is to develop interfaces so that constituent

systems

can interoperate. This may also involve the development of a unified user

inter-

face so that SoS operators do not have to deal with multiple user

interfaces as

they use the different systems in the SoS.

5. Integration and deployment This stage involves making the different

systems involved in the SoS work together and interoperate through the

developed interfaces. System deployment means putting the system into

place in the organizations

concerned and making it operational.

In parallel with these technical activities, there needs to be a high-level

activity concerned with establishing policies for the governance of the

system of systems and defining management guidelines to implement

these policies. Where there are several

organizations involved, this process can be prolonged and difficult. It may

involve

organizations changing their own policies and processes. It is therefore

important to start governance discussions at an early stage in the SoS

development process.

20.4.1 Interface development

The constituent systems in a SoS are usually developed independently for

some spe-

cific purpose. Their user interface is tailored to that original purpose.

These systems may or may not have application programming interfaces

(APIs) that allow other

systems to interface directly to them. Therefore, when these systems are

integrated

into a SoS, software interfaces have to be developed, which allows the

constituent

systems in the SoS to interoperate.

In general, the aim in SoS development is for systems to be able to

communicate

directly with each other without user intervention. If these systems

already offer a service-based interface, as discussed in Chapter 18, then

this communication can be

implemented using this approach. Interface development involves

describing how to

596 Chapter 20 Systems of systems

Service interfaces

System 1

Unified service

interface

Principal

System 2

system

System 3

Figure 20.7 Systems

with service interfaces

use the interfaces to access the functionality of each system. The systems

involved

can communicate directly with each other. System coalitions, where all of

the sys-

tems involved are peers, are likely to use this type of direct interaction as

it does not require prearranged agreements on system communication

protocols.

More commonly, however, the constituent systems in a SoS either have

their own

specialized API or only allow their functionality to be accessed through

their user

interfaces. You therefore have to develop software that reconciles the

differences

between these interfaces. It is best to implement these interfaces as

service-based

interfaces, as shown in Figure 20.7 (Sillitto 2010).

To develop service-based interfaces, you have to examine the functionality

of exist-

ing systems and define a set of services to reflect that functionality. The

interface then provides these services. The services are implemented either

by calls to the underlying system API or by mimicking user interaction

with the system. One of the systems in

the SoS is usually a principal or coordinating system that manages the

interactions

between the constituent systems. The principal system acts as a service

broker, directing service calls between the different systems in the SoS.

Each system therefore does not need to know which other system is

providing a called service.

User interfaces for each system in a SoS are likely to be different. The

principal

system must have some overall user interfaces that handle user

authentication and

provide access to the features of the underlying system. However, it is

usually

expensive and time consuming to implement a unified user interface to

replace the

individual interfaces of the underlying systems.

A unified user interface (UI) makes it easier for new users to learn to use

the SoS

and reduces the likelihood of user error. However, whether or not unified

UI devel-

opment is cost-effective depends on a number of factors:

1. The interaction assumptions of the systems in the SoS Some systems may

have a process-driven model of interaction where the system controls the

interface and

prompts the user for inputs. Others may give control to the user, so that

the user

chooses the sequence of interactions with the system. It is practically

impossible

to unify different interaction models.

20.4 Systems of systems engineering 597

2. The mode of use of the SoS In many cases, SoS are used in such a way

that most of the interactions of users at a site are with one of the

constituent systems. They use other systems only when additional

information is required. For example, air

traffic controllers may normally use a radar system for flight information

and only

access a flight plan database when additional information is required. A

unified

interface is a bad idea in these situations because it would slow down

interaction

with the most commonly used system. However, if the operators interact

with all

of the constituent systems, then a unified UI may be the best way forward.

3. The “openness” of the SoS If the SoS is open, so that new systems may be

added to it when it is in use, then unified UI development is impractical. It

is

impossible to anticipate what the UI of new systems will be. Openness also

applies to the organizations using the SoS. If new organizations can

become

involved, then they may have existing equipment and their own

preferences for

user interaction. They may therefore prefer not to have a unified UI.

In practice, the limiting factor in UI unification is likely to be the budget

and time available for UI development. UI development is one of the most

expensive systems

engineering activities. In many cases, there is simply not enough project

budget

available to pay for the creation of a unified SoS user interface.

20.4.2 Integration and deployment

System integration and deployment are usually separate activities. A

system is inte-

grated from its components by an integration and testing team, validated,

and then

released for deployment. The components are managed so that changes

are con-

trolled and the integration team can be confident that the required version

is included in the system. However, for SoS, such an approach may not be

possible. Some of the

component systems may already be deployed and in use, and the

integration team

cannot control changes to these systems.

For SoS, therefore, it makes sense to consider integration and deployment

to be

part of the same process. This approach reflects one of the design

guidelines that I discuss in the following section, which is that an

incomplete system of systems

should be usable and provide useful functionality. The integration process

should

begin with systems that are already deployed, with new systems added to

the SoS to

provide coherent additions to the functionality of the overall system.

It often makes sense to plan the deployment of the SoS to reflect this, so

that SoS

deployment takes place in a number of stages. For example, Figure 20.8

illustrates a three-stage deployment process for the iLearn digital learning

environment:

1. The initial deployment provides authentication, basic learning

functionality,

and integration with school administration systems.

2. Stage 2 of the deployment adds an integrated storage system and a set

of more

specialized tools to support subject-specific learning. These tools might

include

598 Chapter 20 Systems of systems

Release timeline

Office 365

Programming

Data analysis tools

environments

School admin systems

ibook tools

Science simulation systems

Google Apps

Learning portfolio system

Content systems

(history, languages, etc.)

Age-specific tools

Wordpress

Drawing and photo tools

iLearn V2 tools

Conferencing system

iLearn V1 tools

Configuration system

Moodle VLE

Storage system

Storage system

Authentication system

Authentication system

Authentication system

iLearn V1

iLearn V2

iLearn V3

Figure 20.8 Release

sequence For the

iLearn SoS

archives for history, simulation systems for science, and programming

environ-

ments for computing.

3. Stage 3 adds features for user configuration and the ability of users to

add new systems to the iLearn environment. This stage allows different

versions of the

system to be created for different age groups, further specialized tools, and

alternatives to the standard tools to be included.

As in any large systems engineering project, the most time-consuming and

expen-

sive part of system integration is system testing. Testing systems of

systems is difficult and expensive for three reasons:

1. There may not be a detailed requirements specification that can be used

as a

basis for system testing. It may not be cost-effective to develop a SoS

require-

ments document because the details of the system functionality are

defined by

the systems that are included.

2. The constituent systems may change in the course of the testing process,

so tests may not be repeatable.

3. If problems are discovered, it may not be possible to fix the problems by

requiring one or more of the constituent systems to be changed. Rather,

some interme-

diate software may have to be introduced to solve the problem.

To help address some of these problems, I believe that SoS testing should

take on

board some of the testing techniques developed in agile methods:

1. Agile methods do not rely on having a complete system specification for

system

acceptance testing. Rather, stakeholders are closely engaged with the

testing process

20.5 Systems of systems architecture 599

and have the authority to decide when the overall system is acceptable.

For SoS, a

range of stakeholders should be involved in the testing process if possible,

and

they can comment on whether or not the system is ready for deployment.

2. Agile methods make extensive use of automated testing. This makes it

much

easier to rerun tests to discover if unexpected system changes have caused

prob-

lems for the SoS as a whole.

Depending on the type of system, you may have to plan the installation of

equip-

ment and user training as part of the deployment process. If the system is

being

installed in a new environment, equipment installation is straightforward.

However,

if it is intended to replace an existing system, there may be problems in

installing new equipment if it is not compatible with the equipment that is

in use. There may not be the physical space for the new equipment to be

installed alongside the working system. There may be insufficient

electrical power, or users may not have time to be

involved because they are busy using the current system. These

nontechnical issues

can delay the deployment process and slow down the adoption and use of

the SoS.

20.5 Systems of systems architecture

Perhaps the most crucial activity of the systems of systems engineering

process is

architectural design. Architectural design involves selecting the systems to

be

included in the SoS, assessing how these systems will interoperate, and

designing

mechanisms that facilitate interaction. Key decisions on data management,

redun-

dancy, and communications are made. In essence, the SoS architect is

responsible

for realizing the vision set out in the conceptual design of the system. For

organizational and federated systems, in particular, decisions made at this

stage are crucial to the performance, resilience, and maintainability of the

system of systems.

Maier (Maier 1998) discusses four general principles for the architecting

of com-

plex systems of systems:

1. Design systems so that they can deliver value if they are incomplete.

Where a

system is composed of several other systems, it should not just be useful if

all of

its components are working properly. Rather, there should be several

“stable

intermediate forms” so that a partial system works and can do useful

things.

2. Be realistic about what can be controlled. The best performance from a

SoS may be achieved when an individual or group exerts control over the

overall system and its

constituents. If there is no control, then delivering value from the SoS is

difficult.

However, attempts to overcontrol the SoS are likely to lead to resistance

from the

individual system owners and consequent delays in system deployment

and evolution.

3. Focus on the system interfaces. To build a successful system of systems,

you

have to design interfaces so that the system elements can interoperate. It

is

600 Chapter 20 Systems of systems

important that these interfaces are not too restrictive so that the system

elements

can evolve and continue to be useful participants in the SoS.

4. Provide collaboration incentives. When the system elements are

independently

owned and managed, it is important each system owner have incentives to

continue

to participate in the system. These may be financial incentives (pay per

use or reduced operational costs), access incentives (you share your data

and I’ll share mine), or

community incentives (participate in a SoS and you get a say in the

community).

Sillitto (Sillitto 2010) has added to these principles and suggests

additional

important design guidelines. These include the following:

1. Design a SoS as node and web architecture. Nodes are sociotechnical

systems

that include data, software, hardware, infrastructure (technical

components),

and organizational policies, people, processes, and training

(sociotechnical).

The web is not just the communications infrastructure between nodes, but

it also

provides a mechanism for informal and formal social communications

between

the people managing and running the systems at each node.

2. Specify behavior as services exchanged between nodes. The

development of

service-oriented architectures now provides a standard mechanism for

system

operability. If a system does not already provide a service interface, then

this

interface should be implemented as part of the SoS development process.

3. Understand and manage system vulnerabilities. In any SoS, there will be

unex-

pected failures and undesirable behavior. It is critically important to try to

understand vulnerabilities and design the system to be resilient to such

failures.

The key message that emerges from both Maier’s and Sillitto’s work is that

SoS

architects have to take a broad perspective. They need to look at the

system as a

whole, taking into account both technical and sociotechnical

considerations.

Sometimes the best solution to a problem is not more software but

changes to the

rules and policies that govern the operation of the system.

Architectural frameworks such as MODAF (MOD 2008) and TOGAF

(TOGAF

is a registered trademark of The Open Group 2011) have been suggested

as a means

of supporting the architectural design of systems of systems. Architectural

frame-

works were originally developed to support enterprise systems

architectures, which

are portfolios of separate systems. Enterprise systems may be

organizational systems of systems, or they may have a simpler

management structure so that the system

portfolio can be managed as a whole. Architectural frameworks are

intended for the

development of organizational systems of systems where there is a single

govern-

ance authority for the entire SoS.

An architectural framework recognizes that a single model of an

architecture does

not present all of the information needed for architectural and business

analysis.

Rather, frameworks propose a number of architectural views that should

be created

and maintained to describe and document enterprise systems. Frameworks

have

much in common and tend to reflect the language and history of the

organizations

20.5 Systems of systems architecture 601

Preliminary

A.

Architecture

vision

H.

B.

Architecture

Business

change

architecture

management

C.

G.

Information

Implementation

Requirements

systems

governance

management

architectures

F.

D.

Migration

Technology

planning

architecture

Figure 20.9 The TOGAF

E.

architecture development

Opportunities

method (TOGAF ®

and solutions

Version 9.1, © 1999–2011.

The Open Group.)

involved. For example, MODAF and DODAF are comparable frameworks

from the

UK Ministry of Defence (MOD) and the U.S. Department of Defense

(DOD).

The TOGAF framework has been developed by the Open Group as an open

stand-

ard and is intended to support the design of a business architecture, a data

architecture, an application architecture, and a technology architecture for

an enterprise. At its heart is the Architecture Development Method (ADM),

which consists of a number of discrete phases. These are shown in Figure

20.9, taken from the TOGAF refer-

ence documentation (Open Group 2011).

All architectural frameworks involve the production and management of a

large

set of architectural models. Each of the activities shown in Figure 20.8

leads to the production of system models. However, this is problematic for

two reasons:

1. Initial model development takes a long time and involves extensive

negotiations

between system stakeholders. This slows the development of the overall

system.

2. It is time-consuming and expensive to maintain model consistency as

changes

are made to the organization and the constituent systems in a SoS.

Architecture frameworks are fundamentally reductionist, and they largely

ignore

sociotechnical and political issues. While they do recognize that problems

are difficult to define and are open-ended, they assume a degree of control

and governance

602 Chapter 20 Systems of systems

Data feed 3

Principal

Data feed 2

Data feed 4

system

Data feed 1

Figure 20.10 Systems

as data feeds

that is impossible to achieve in many systems of systems. They are a useful

checklist to remind architects of things to think about in the architectural

design process.

However, I think that the overhead involved in model management and

the reduction-

ist approach taken by frameworks limits their usefulness in SoS

architectural design.

20.5.1 Architectural patterns for systems of systems

I have described architectural patterns for different types of system in

Chapters 6, 17, and 21. In short, an architectural pattern is a stylized

architecture that can be recognized across a range of different systems.

Architectural patterns are a useful

way of stimulating discussions about the most appropriate architecture for

a system

and for documenting and explaining the architectures used. This section

covers a

number of “typical” patterns in systems of software systems. As with all

architec-

tural patterns, real systems are usually based on more than one of these

patterns.

The notion of architectural patterns for systems of systems is still at an

early stage of development. Kawalsky (Kawalsky et al. 2013) discusses the

value of architectural patterns in understanding and supporting SoS

design, with a focus on patterns

for command and control systems. I find that patterns are effective in

illustrating

SoS organization, without the need for detailed domain knowledge.

Systems as data-feeds

In this architectural pattern (Figure 20.10), there is a principal system that

requires data of different types. This data is available from other systems,

and the principal system queries these systems to get the data required.

Generally, the systems that

provide data do not interact with each other. This pattern is often

observed in organizational or federated systems where some governance

mechanisms are in place.

For example, to license a vehicle in the UK, you need to have both valid

insurance and a roadworthiness certificate. When you interact with the

vehicle licensing system, it interacts with two other systems to check that

these documents are valid. These systems are: 1. An

“insured vehicles” system, which is a federated system run by car insurance

companies that maintains information about all current car insurance

policies.

20.5 Systems of systems architecture 603

Principal

Data feed 2

Data feed 3

system

Data feed 1

Figure 20.11 Systems

Data feed 1(a)

Data feed 1(b)

Data feed 1(c)

as data feeds with a

unifying interface

2. An “MOT certificate” system, which is used to record all roadworthiness

certificates issued by testing agencies licensed by the government.

The “systems as data feeds” architecture is an appropriate architecture to

use

when it is possible to identify entities in a unique way and create

relatively simple queries about these entities. In the licensing system,

vehicles can be uniquely identified by their registration number. In other

systems, it may be possible to identify

entities such as pollution monitors by their GPS coordinates.

A variant of the “systems as data feeds” architecture arises when a number

of

systems provide data that are similar but not identical. Therefore, the

architecture has to include an intermediate layer as shown in Figure

20.11. The role of this

intermediate layer is to translate the general query from the principal

system into the specific query required by the individual information

system.

For example, the iLearn environment interacts with school administration

systems

from three different providers. All of these systems provide the same

information about students (names, personal information, etc.) but have

different interfaces. The databases have different organizations, and the

format of the data returned differs from one system to another. The

unifying interface here detects where the user of the system is based and,

using this regional information, knows which administrative system

should be

accessed. It then converts a standard query into the appropriate query for

that system.

Problems that can arise in systems that use this pattern are primarily

interface

problems when the data feeds are unavailable or are slow to respond. It is

important to ensure that timeouts are included in the system so that a

failure of a data feed does not compromise the response time of the system

as a whole. Governance mechanisms should be in place to ensure that the

format of provided data is not changed

without the agreement of all system owners.

Systems in a container

Systems in a container are systems of systems where one of the systems

acts as a

virtual container and provides a set of common services such as an

authentication

and a storage service. Conceptually, other systems are then placed into

this container

604 Chapter 20 Systems of systems

Included systems

s1

s2

s3

s4

s5

s6

Common service 3

Common service 2

Common service 1

Figure 20.12 Systems in

a container

Container system

to make their functionality accessible to system users. Figure 20.12

illustrates a container system with three common services and six included

systems. The systems

that are included may be selected from an approved list of systems and

need not be

aware that they are included in the container. This pattern of SoS is most

often

observed in federated systems or system coalitions.

The iLearn environment is a system in a container. There are common

services

that support authentication, storage of user data, and system

configuration. Other

functionality comes from choosing existing systems such as a newspaper

archive or

a virtual learning environment and integrating these into the container.

Of course, you don’t place systems into a real container to implement

these systems

of systems. Rather, for each approved system, there is a separate interface

that allows it to be integrated with the common services. This interface

manages the translation of the common services provided by the container

and the requirements of the integrated system. It may also be possible to

include systems that are not approved. However,

these will not have access to the common services provided by the

container.

Figure 20.13 illustrates this integration. This graphic is a simplified

version of

iLearn that provides three common services:

1. An authentication service that provides a single sign-in to all approved

systems.

Users do not have to maintain separate credentials for these systems.

2. A storage service for user data. This service can be seamlessly

transferred to and from approved systems.

3. A configuration service that is used to include or remove systems from

the container.

This example shows a version of iLearn for Physics. As well as an office

productivity system (Office 365) and a VLE (Moodle), this system includes

simulation and data analysis systems. Other systems—YouTube and a

science encyclopedia—are also part of this system. However, these are not

“approved,” and so no container interface is available.

Users must log on to these systems separately and organize their own data

transfers.

20.5 Systems of systems architecture 605

The Digital Learning Environment

Authentication

Storage

Configuration

Science

External interaction

YouTube

encyclopedia

Interfaces

MS Office

Physics

Moodle

Lab data

Figure 20.13 The DLE

365

simulator

analyzer

as a container system

There are two problems with this type of SoS architecture:

1. A separate interface must be developed for each approved system so

that com-

mon services can be used with these systems. This means that only a

relatively

small number of approved systems can be supported.

2. The owners of the container system have no influence on the

functionality and

behavior of the included systems. Systems may stop working, or they may

be

withdrawn at any time.

However, the main benefit of this architecture is that it allows for

incremental

development. An early version of the container system can be based on

“unap-

proved” systems. Interfaces to these can be developed in later versions so

that they are more closely integrated with the container services.

Trading systems

Trading systems are systems of systems where there is no single principal

system but processing may take place in any of the constituent systems.

The systems involved

trade information among themselves. There may be one-to-one or one-to-

many inter-

actions between these systems. Each system publishes its own interface,

but there may not be any interface standards that are followed by all

systems. This system is shown in Figure 20.14. Trading systems may be

federated systems or system coalitions.

An example of a trading SoS is a system of systems for algorithmic trading

of

stocks and shares. Brokers all have their own separate systems that can

automati-

cally buy and sell stock from other systems. They set prices and negotiate

individu-

ally with these systems. Another example of a trading system is a travel

aggregator

that shows price comparisons and allows travel to be booked directly by a

user.

606 Chapter 20 Systems of systems

Trading

Trading

system 1

system 2

Trading

Trading

system 3

system 4

Figure 20.14 A trading

system of systems

Trading systems may be developed for any type of marketplace, with the

informa-

tion exchanged being information about the goods being traded and their

prices.

Although trading systems are systems in their own right and could

conceivably be

used for individual trading, they are most useful in an automated trading

context

where the systems negotiate directly with each other.

The major problem with this type of system is that there is no governance

mecha-

nism, so any of the systems involved may change at any time. Because

these changes

may contradict the assumptions made by other systems, trading cannot

continue.

Sometimes the owners of the systems in the coalition wish to be able to

continue

trading with other systems and so may make informal arrangements to

ensure that

changes to one system do not make trading impossible. In other cases,

such as a

travel aggregator, an airline may deliberately change its system so that it

is unavailable and so force bookings to be made directly with it.

K e y P o i n t s

Systems of systems are systems where two or more of the constituent

systems are independently managed and governed.

Three types of complexity are important for systems of systems—

technical complexity, managerial complexity, and governance complexity.

System governance can be used as the basis for a classification scheme

for SoS. This leads to three classes of SoS, namely, organizational systems,

federated systems, and system coalitions.

Reductionism as an engineering method breaks down because of the

inherent complexity of systems of systems. Reductionism assumes clear

system boundaries, rational decision making, and well-defined problems.

None of these are true for systems of systems.

The key stages of the SoS development process are conceptual design,

system selection, architectural design, interface development, and

integration and deployment. Governance and management policies must

be designed in parallel with these activities.

Chapter 20 Exercises 607

Architectural patterns for systems of systems are a means of describing

and discussing typical architectures for SoS. Important patterns are

systems as data feeds, systems in a container, and trading systems.

F u R T h e R R e a d i n g

“Architecting Principles for Systems of Systems.” A now-classic paper on

systems of systems that introduces a classification scheme for SoS,

discusses its value, and proposes a number of architectural principles for

SoS design. (M. Maier, Systems Engineering, 1 (4), 1998).

Ultra-large Scale Systems: The Software Challenge of the Future This book,

produced for the U.S.

Department of Defense in 2006, introduces the notion of ultra-large-scale

systems, which are systems of systems with hundreds of nodes. It discusses

the issues and challenges in developing such systems. (L. Northrop et al.,

Software Engineering Institute, 2006). http://www.sei.cmu.edu/library/

assets/ULS_Book20062.pdf

“Large-scale Complex IT Systems.” This paper discusses the problems of

large-scale complex IT

systems that are systems of systems and expands on the ideas here on the

breakdown of reductionism.

It proposes a number of research challenges in the area of SoS. (I.

Sommerville et al., Communica-

tions of the ACM, 55 (7), July 2012). http://dx.doi.org/

10.1145/2209249.2209268

W e b S i T e

PowerPoint slides for this chapter:

www.pearsonglobaleditions.com/Sommerville

Links to supporting videos:

http://software-engineering-book.com/videos/systems-engineering/

e x e R C i S e S

20.1. Explain why managerial and operational independence are the key

distinguishing characteristics of systems of systems when compared to

other large, complex systems.

20.2. Briefly explain any four essential characteristics of systems of

systems.

608 Chapter 20 Systems of systems

20.3. The classification of SoS presented in Section 20.2 suggests a

governance-based classification scheme. Giving reasons for your answer,

identify the classifications for the following systems of systems:

(a) A health care system that provides unified access to all patient health

records from hospitals, clinics, and primary care.

(b) The World Wide Web

(c) A government system that provides access to a range of welfare

services such as pen-sions, disability benefits, and unemployment benefits.

Are there any problems with the suggested classification for any of these

systems?

20.4. Explain what is meant by reductionism and why it is effective as a

basis for many kinds of engineering.

20.5. Define systems of systems engineering. List the problems of software

SoS engineering that are also common to problems of integrating large-

scale application systems.

20.6. How beneficial is a unified user interface in the interface design of

SoS? What are the factors on which the cost-effectiveness of a unified user

interface is dependent?

20.7. Sillitto suggests that communications between nodes in a SoS are

not just technical but should also include informal sociotechnical

communications between the people involved in the system. Using the

iLearn SoS as an example, suggest where these informal communications

may be important to improve the effectiveness of the system.

20.8. Suggest the closest-fit architectural pattern for the systems of

systems introduced in Exercise 20.3.

20.9. The trading system pattern assumes that there is no central

authority involved. However, in areas such as equity trading, trading

systems must follow regulatory rules. Suggest how this pattern might be

modified to allow a regulator to check that these rules have been

followed.

This should not involve all trades going through a central node.

20.10. You work for a software company that has developed a system that

provides information about consumers and that is used within a SoS by a

number of other retail businesses. They pay you for the services used.

Discuss the ethics of changing the system interfaces without notice to

coerce users into paying higher charges. Consider this question from the

point of view of the company’s employees, customers, and shareholders.

R e F e R e n C e S

Boehm, B., and C. Abts. 1999. “COTS Integration: Plug and Pray?”

Computer 32 (1): 135–138.

doi:10.1109/2.738311.

Hitchins, D. 2009. “System of Systems—The Ultimate Tautology.” http://

www.hitchins.net/profs-

stuff/profs-blog/system-of-systems---the.html

Chapter 20 References 609

Kawalsky, R., D. Joannou, Y. Tian, and A. Fayoumi. 2013. “Using

Architecture Patterns to Architect and Analyze Systems of Systems.” In

Conference on Systems Engineering Research (CSER 13), 283–292.

doi:10.1016/j.procs.2013.01.030.

Maier, M. W. 1998. “Architecting Principles for Systems-of-Systems.”

Systems Engineering 1 (4): 267–284. doi:10.1002/(SICI)1520–

6858(1998)1:4< 267::AID-SYS3> 3.0.CO;2-D.

MOD, UK. 2008. “MOD Architecture Framework.” https://www.gov.uk/

mod-architecture-framework

Northrop, Linda, R. P. Gabriel, M. Klein, and D. Schmidt. 2006. Ultra-

Large-Scale Systems: The Software Challenge of the Future. Pittsburgh:

Software Engineering Institute. http://www.sei.cmu.

edu/library/assets/ULS_Book20062.pdf

Open Group. 2011. “Open Group Standard TOGAF Version 9.1.” http://

pubs.opengroup.org/archi-

tecture/togaf91-doc/arch/

Rittel, H., and M. Webber. 1973. “Dilemmas in a General Theory of

Planning.” Policy Sciences 4: 155–169. doi:10.1007/BF01405730.

Royal Academy of Engineering. 2004. “Challenges of Complex IT

Projects.” London. http://www.

bcs.org/upload/pdf/complexity.pdf

Sillitto, H. 2010. “Design Principles for Ultra-Large-Scale Systems.” In

Proceedings of the 20th International Council for Systems Engineering

International Symposium. Chicago.

Sommerville, I., D. Cliff, R. Calinescu, J. Keen, T. Kelly, M. Kwiatkowska,

J. McDermid, and R. Paige.

2012. “Large-Scale Complex IT Systems.” Comm. ACM 55 (7): 71–77.

doi:10.1145/2209249.2209268.

Stevens, R. 2010. Engineering Mega-Systems: The Challenge of Systems

Engineering in the Information Age. Boca Raton, FL: CRC Press.

21

Real-time software

engineering

Objectives

The objective of this chapter is to introduce some of the characteristic

features of embedded real-time software engineering. When you have

read this chapter, you will:

understand the concept of embedded software, which is used to

control systems that react to external events in their environment;

have been introduced to a design process for real-time systems,

where the software systems are organized as a set of cooperating

processes;

understand three architectural patterns that are commonly used in

embedded real-time systems design;

understand the organization of real-time operating systems and

the role that they play in an embedded, real-time system.

Contents

21.1 Embedded systems design

21.2 Architectural patterns for real-time software

21.3 Timing analysis

21.4 Real-time operating systems

Chapter 21 Real-time software engineering 611

Computers are used to control a wide range of systems from simple

domestic

machines, through games controllers, to entire manufacturing plants.

These comput-

ers interact directly with hardware devices. Their software must react to

events

generated by the hardware and often issue control signals in response to

these

events. These signals result in an action, such as the initiation of a phone

call, the movement of a character on the screen, the opening of a valve, or

the display of the system status. The software in these systems is

embedded in system hardware, often

in read-only memory. It responds, in real time, to events from the system’s

environ-

ment. By real time, I mean that the software system has a deadline for

responding to external events. If this deadline is missed, then the overall

hardware–software system will not operate correctly.

Embedded software is very important economically because almost every

electri-

cal device now includes software. There are therefore many more

embedded software

systems than other types of software systems. Ebert and Jones (Ebert and

Jones 2009) estimated that there were about 30 embedded microprocessor

systems per person in

developed countries. This figure was increasing between 10% and 20% per

year. This

suggests that, by 2020, there will be more than 100 embedded systems per

person.

Responsiveness in real time is the critical difference between embedded

systems

and other software systems, such as information systems, web-based

systems, or per-

sonal software systems, whose main purpose is data processing. For non–

real-time

systems, the correctness of a system can be defined by specifying how

system inputs

map to corresponding outputs that should be produced by the system. In

response to

an input, a corresponding output should be generated by the system and,

often, some

data should be stored. For example, if you choose a create command in a

patient

information system, then the correct system response is to create a new

patient

record in a database and to confirm that this has been done. Within

reasonable limits, it does not matter how long this takes.

However, in a real-time system, the correctness depends both on the

response to

an input and the time taken to generate that response. If the system takes

too long to respond, then the required response may be ineffective. For

example, if embedded

software controlling a car’s braking system is too slow, then an accident

may occur

because it is impossible to stop the car in time.

Therefore, time is fundamental in the definition of a real-time software

system:

A real-time software system is a system whose correct operation depends on

both the results produced by the system and the time at which these results are

produced. A “soft real-time system” is a system whose operation is degraded

if results are not produced according to the specified timing requirements. If

results are not produced according to the timing specification in a “hard real-

time system,” this is considered to be a system failure.

Timely response is an important factor in all embedded systems, but not

all embedded systems require a very fast response. For example, the

insulin pump software that I have used as an example in several chapters

of this book is an embedded system. However,

while the system needs to check the glucose level at periodic intervals, it

does not need to

612 Chapter 21 Real-time software engineering

respond very quickly to external events. The wilderness weather station

software is also an embedded system, but, again, it does not require a fast

response to external events.

As well as the need for real-time response, there are other important

differences

between embedded systems and other types of software system:

1. Embedded systems generally run continuously and do not terminate.

They start

when the hardware is switched on, and execute until the hardware is

switched

off. Techniques for reliable software engineering, as discussed in Chapter

11,

may therefore have to be used to ensure continuous operation. The real-

time

system may include update mechanisms that support dynamic

reconfiguration

so that the system can be updated while it is in service.

2. Interactions with the system’s environment are unpredictable. In

interactive systems, the pace of the interaction is controlled by the system.

By limiting user

options, the events and commands to be processed are known in advance.

By

contrast, real-time embedded systems must be able to respond to expected

and

unexpected events at any time. This leads to a design for real-time systems

based on concurrency, with several processes executing in parallel.

3. Physical limitations may affect the design of a system. Examples of

limitations

include restrictions on the power available to the system and the physical

space

taken up by the hardware. These limitations may generate requirements

for the

embedded software, such as the need to conserve power and so prolong

battery life.

Size and weight limitations may mean that the software has to take over

some hard-

ware functions because of the need to limit the number of chips used in

the system.

4. Direct hardware interaction may be necessary. In interactive systems

and infor-

mation systems, a layer of software (the device drivers) hides the

hardware from

the operating system. This is possible because you can only connect a few

types

of device to these systems, such as keyboards, mice, and displays. By

contrast,

embedded systems may have to interact with a wide range of hardware

devices

that do not have separate device drivers.

5. Issues of safety and reliability may dominate the system design. Many

embed-

ded systems control devices whose failure may have high human or

economic

costs. Therefore, dependability is critical, and the system design has to

ensure

safety-critical behavior at all times. This often leads to a conservative

approach

to design where tried and tested techniques are used instead of newer

techniques

that may introduce new failure modes.

Real-time embedded systems can be thought of as reactive systems; that is,

they

must react to events in their environment (Berry 1989; Lee 2002).

Response times

are often governed by the laws of physics rather than chosen for human

conveni-

ence. This is in contrast to other types of software where the system

controls the

speed of the interaction. For example, the word processor that I am using

to write

this book can check spelling and grammar, and there are no practical

limits on the

time taken to do so.

21.1 Embedded system design 613

21.1 Embedded system design

During the design process for embedded software, software designers have

to consider in detail the design and performance of the system hardware.

Part of the system design

process may involve deciding which system capabilities are to be

implemented in software and which in hardware. For many real-time

systems that are embedded in consumer products, such as the systems in

cell phones, the costs and power consumption of the hardware are critical.

Specific processors designed to support embedded systems may be used.

For some systems, special-purpose hardware may have to be designed and

built.

A top-down software design process, in which the design starts with an

abstract

model that is decomposed and developed in a series of stages, is

impractical for most real-time systems. Low-level decisions on hardware,

support software, and system

timing must be considered early in the process. These limit the flexibility

of system designers. Additional software functionality, such as battery and

power management, may have to be included in the system.

Given that embedded systems are reactive systems that react to events in

their

environment, the most general approach to embedded, real-time software

design is

based on a stimulus-response model. A stimulus is an event occurring in

the soft-

ware system’s environment that causes the system to react in some way; a

response

is a signal or message that the software sends to its environment.

You can define the behavior of a real-time system by listing the stimuli

received

by the system, the associated responses, and the time at which the

response must be

produced. For example, Figure 21.1 shows possible stimuli and system

responses for

a burglar alarm system (discussed in Section 21.2.1).

Stimuli fall into two classes:

1. Periodic stimuli These occur at predictable time intervals. For example,

the system may examine a sensor every 50 milliseconds and take action

(respond)

depending on that sensor value (the stimulus).

2. Aperiodic stimuli These occur irregularly and unpredictably and are

usually signaled, using the computer’s interrupt mechanism. An example

of such a stimulus

would be an interrupt indicating that an I/O transfer was complete and

that data

was available in a buffer.

Stimuli come from sensors in the system’s environment, and responses are

sent to

actuators, as shown in Figure 21.2. These actuators control equipment,

such as a

pump, which then makes changes to the system’s environment. The

actuators them-

selves may also generate stimuli. The stimuli from actuators often indicate

that some problem with the actuator has occurred, which must be handled

by the system.

A general design guideline for real-time systems is to have separate

control pro-

cesses for each type of sensor and actuator (Figure 21.3). For each type of

sensor,

there may be a sensor management process that handles data collection

from these

sensors. Data-processing processes compute the required responses for the

stimuli

received by the system. Actuator control processes are associated with

each actuator

614 Chapter 21 Real-time software engineering

Stimulus

Response

Clear alarms

Switch off all active alarms; switch off all lights

that have been switched on.

Console panic button positive

Initiate alarm; turn on lights around console; call

police.

Power supply failure

Call service technician.

Sensor failure

Call service technician.

Single sensor positive

Initiate alarm; turn on lights around site of

positive sensor.

Two or more sensors positive

Initiate alarm; turn on lights around sites of

positive sensors; call police with location of

suspected break-in.

Voltage drop of between 10%

Switch to battery backup; run power supply test.

and 20%

Figure 21.1 Stimuli

Voltage drop of more than

Switch to battery backup; initiate alarm; call

and responses for a

20%

police, run power supply test.

burglar alarm system

and manage the operation of that actuator. This model allows data to be

collected

quickly from the sensor (before it is overwritten by the next input) and

enables processing and the associated actuator response to be carried out

later.

A real-time system has to respond to stimuli that occur at different times.

You

therefore have to organize the system architecture so that, as soon as a

stimulus is received, control is transferred to the correct handler. This is

impractical in sequential programs. Consequently, real-time software

systems are normally designed as a

set of concurrent, cooperating processes. To support the management of

these pro-

cesses, the execution platform on which the real-time system executes may

include a

real-time operating system (discussed in Section 21.4). The functions

provided by

this operating system are accessed through the runtime support system for

the real-

time programming language that is used.

There is no standard embedded system design process. Rather, different

processes

are used that depend on the type of system, available hardware, and the

organization that is developing the system. The following activities may

be included in a real-time software design process:

1. Platform selection In this activity, you choose an execution platform for

the system, that is, the hardware and the real-time operating system to be

used.

Factors that influence these choices include the timing constraints on the

sys-

tem, limitations on power available, the experience of the development

team,

and the price target for the delivered system.

2. Stimuli/response identification This involves identifying the stimuli that

the system must process and the associated response or responses for each

stimulus.

21.1 Embedded system design 615

Sensor

Sensor

Sensor

Sensor

Sensor

Sensor

Stimuli

Real-time

control system

Responses

Figure 21.2 A general

model of an embedded

Actuator

Actuator

Actuator

Actuator

real-time system

3. Timing analysis For each stimulus and associated response, you identify

the timing constraints that apply to both stimulus and response

processing. These

constraints are used to establish the deadlines for the processes in the

system.

4. Process design Process design involves aggregating the stimulus and

response processing into a number of concurrent processes. A good

starting point for

designing the process architecture is the architectural patterns that I

describe in

Section 20.2. You then optimize the process architecture to reflect the

specific

requirements that you have to implement.

5. Algorithm design For each stimulus and response, you design algorithms

to carry out the required computations. Algorithm designs may have to be

developed relatively early in the design process to indicate the amount of

processing

required and the time needed to complete that processing. This is

especially

important for computationally intensive tasks, such as signal processing.

6. Data design You specify the information that is exchanged by processes

and the events that coordinate information exchange, and design data

structures to manage this information exchange. Several concurrent

processes may share these

data structures.

7. Process scheduling You design a scheduling system that will ensure that

processes are started in time to meet their deadlines.

The specific activities and the activity sequence in a real-time system

design pro-

cess depend on the type of system being developed, its novelty, and its

environment.

Sensor

Actuator

Stimulus

Response

Sensor

Data

Actuator

Figure 21.3 Sensor and

control

processor

control

actuator processes

616 Chapter 21 Real-time software engineering

Circular Buffer

Producer

process

v10

Tail

v9

v8

Head

Consumer

v1

process

v7

v2

Figure 21.4 Producer/

v6

v3

consumer processes

v4

v5

sharing a circular buffer

In some cases, for new systems, you may be able to follow a fairly abstract

approach where you start with the stimuli and associated processing, and

decide on the hardware and execution platforms late in the process. In

other cases, the choice of hardware and operating system is made before

the software design starts. You then have to design

the software to take account of the constraints imposed by the system

hardware.

Processes in a real-time system have to be coordinated and share

information.

Process coordination mechanisms ensure mutual exclusion to shared

resources. When

one process is modifying a shared resource, other processes should not be

able to change that resource. Mechanisms for ensuring mutual exclusion

include semaphores, monitors, and critical regions. These process

synchronization mechanisms are described in most operating system books

(Silberschaltz, Galvin, and Gagne 2013; Stallings 2014).

When designing the information exchange between processes, you have to

take

into account that these processes may be running at different speeds. One

process is producing information, and the other process is consuming that

information. If the

producer is running faster than the consumer, new information could

overwrite a

previously read information item before the consumer process has read

the original

information. If the consumer process is running faster than the producer

process, the same item could be read twice.

To avoid this problem, you should implement information exchange using

a

shared buffer and use mutual exclusion mechanisms to control access to

that buffer.

This means that information can’t be overwritten before it has been read

and that

information cannot be read twice. Figure 21.4 illustrates the organization

of a shared buffer. This is usually implemented as a circular queue, using a

list data structure.

Mismatches in speed between the producer and consumer processes can be

accom-

modated without having to delay process execution.

The producer process always enters data in the buffer location at the end

of the

queue (represented as v10 in Figure 21.4). The consumer process always

retrieves

information from the head of the queue (represented as v1 in Figure 21.4).

After the consumer process has retrieved the information, the tail of the

queue is adjusted to point at the next item (v2). After the producer process

has added information, the tail of the queue is adjusted to point at the

next free slot in the queue.

21.1 Embedded system design 617

Obviously, it is important to ensure that the producer and consumer

process do

not attempt to access the same item at the same time (i.e., when Head =

Tail). If they do, the value of the item is unpredictable. The system also

has to ensure that the

producer process does not add items to a full buffer and that the consumer

process

does not try to take items from an empty buffer.

To do this, you implement the circular buffer as a process with Get and

Put oper-

ations to access the buffer. The Put operation is called by the producer

process and the Get operation by the consumer process. Synchronization

primitives, such as

semaphores or critical regions, are used to ensure that the operation of Get

and Put are synchronized, so that they don’t access the same location

simultaneously. If the buffer is full, the Put process has to wait until a slot

is free; if the buffer is empty, the Get process has to wait until an entry has

been made.

Once you have chosen the execution platform for the system, designed a

process

architecture, and decided on a scheduling policy, you have to check that

the system

will meet its timing requirements. You can perform this check through

static analysis of the system using knowledge of the timing behavior of

components, or through

simulation. This analysis may reveal that the system will not perform

adequately. The process architecture, the scheduling policy, the execution

platform, or all of these may then have to be redesigned to improve the

performance of the system.

Timing constraints or other requirements may sometimes mean that it is

best to

implement some system functions, such as signal processing, in hardware.

Modern

hardware components, such as FPGAs (field-programmable gate arrays),

are flexible

and can be adapted to different functions. Hardware components deliver

much better

performance than the equivalent software. System processing bottlenecks

can be

identified and replaced by hardware, thus avoiding expensive software

optimization.

21.1.1 Real-time system modeling

The events that a real-time system must react to often cause the system to

move from one state to another. For this reason, state models, which I

introduced in Chapter 5, are used to describe real-time systems. A state

model of a system assumes that, at

any time, the system is in one of a number of possible states. When a

stimulus is

received, this may cause a transition to a different state. For example, a

system controlling a valve may move from a state “Valve open” to a state

“Valve closed” when

an operator command (the stimulus) is received.

State models are an integral part of real-time system design methods. The

UML

supports the development of state models based on Statecharts (Harel

1987, 1988).

Statecharts are formal state machine models that support hierarchical

states, so that groups of states can be considered as a single entity.

Douglass discusses the use of the UML in real-time systems development

(Douglass 1999).

I have already illustrated this approach to system modeling in Chapter 5

where I

used an example of a model of a simple microwave oven. Figure 21.5 is

another

example of a state model that shows the operation of a fuel delivery

software system embedded in a petrol (gas) pump. The rounded rectangles

represent system states,

and the arrows represent stimuli that force a transition from one state to

another.

618 Chapter 21 Real-time software engineering

Timeout

Card

inserted

into reader

Reading

Initializing

do: get CC

do: initialize

details

display

Hose out of holster

Card removed

Waiting

Card OK

Hose in

do: display

holster

Validating

welcome

Ready

Delivering

do: validate

do:

credit card

deliver fuel

Nozzle

update display

trigger on

Timeout

Invalid card

Nozzle trigger off

Nozzle trigger on

Resetting

Stopped

do: display CC

error

Paying

Payment ack.

do: debit

Hose in

CC account

holster

Figure 21.5 State

machine model of a

petrol (gas) pump

The names chosen in the state machine diagram are descriptive. The

associated

information indicates actions taken by the system actuators or information

that is

displayed. Notice that this system never terminates but idles in a waiting

state when the pump is not operating.

The fuel delivery system is designed to allow unattended operation, with

the fol-

lowing sequence of actions:

1. The buyer inserts a credit card into a card reader built into the pump.

This causes a transition to a Reading state where the card details are read

and the buyer is

then asked to remove the card.

2. Removal of the card triggers a transition to a Validating state where the

card is validated.

3. If the card is valid, the system initializes the pump and, when the fuel

hose is removed from its holster, transitions to the Delivering state, where

is ready to

deliver fuel. Activating the trigger on the nozzle causes fuel to be pumped;

this

stops when the trigger is released (for simplicity, I have ignored the

pressure

switch that is designed to stop fuel spillage).

21.1 Embedded system design 619

Real-time Java

The Java programming language has been modified to make it suitable for

real-time systems development.

These modifications include asynchronous communications, the addition

of time, including absolute and relative time, a new thread model where

threads cannot be interrupted by garbage collection, and a new memory

management model that avoids the unpredictable delays that can result

from garbage collection.

http://software-engineering-book.com/web/real-time-java/

4. After the fuel delivery is complete and the buyer has replaced the hose

in its

holster, the system moves to a Paying state where the user’s account is

debited.

5. After payment, the pump software returns to the Waiting state.

State models are used in model-driven engineering, which I discussed in

Chapter 5,

to define the operation of a system. They can be transformed

automatically or semiautomatically to an executable program.

21.1.2 Real-time programming

Programming languages for real-time systems development have to

include facilities

to access system hardware, and it should be possible to predict the timing

of particular operations in these languages. Hard real-time systems,

running on limited hard-

ware, are still sometimes programmed in assembly language so that tight

deadlines

can be met. Systems programming languages, such as C, which allow

efficient code

to be generated, are widely used.

The advantage of using a systems programming language like C is that it

allows

the development of efficient programs. However, these languages do not

include

constructs to support concurrency or the management of shared resources.

Concurrency and resource management are implemented through calls to

primitives

provided by the real-time operating system for mutual exclusion. Because

the com-

piler cannot check these calls, programming errors are more likely.

Programs are

also often more difficult to understand because the language does not

include real-

time features. As well as understanding the program, the reader also has

to know

how real-time support is provided using system calls.

Because real-time systems must meet their timing constraints, you may not

be

able to use object-oriented development for hard real-time systems.

Object-oriented

development involves hiding data representations and accessing attribute

values

through operations defined with the object. There is a significant

performance over-

head in object-oriented systems because extra code is required to mediate

access to

attributes and handle calls to operations. The consequent loss of

performance may

make it impossible to meet real-time deadlines.

A version of Java has been developed for embedded systems development

(Burns

and Wellings 2009; Bruno and Bollella 2009). This language includes a

modified

thread mechanism, which allows threads to be specified that will not be

interrupted

620 Chapter 21 Real-time software engineering

by the language garbage collection mechanism. Asynchronous event

handling and

timing specification has also been included. However, at the time of

writing, this

specification has mostly been used on platforms that have significant

processor and

memory capacity (e.g., a cell phone) rather than simpler embedded

systems, with

more limited resources. These systems are still usually implemented in C.

21.2 Architectural patterns for real-time software

Architectural patterns are abstract, stylized descriptions of good design

practice.

They capture knowledge about the organization of system architectures,

when these

architectures should be used, and their advantages and disadvantages. You

use an

architectural pattern to understand an architecture and as starting point

for creating your own, specific architectural design.

The difference between real-time and interactive software means that

there are

distinct architectural patterns for real-time embedded systems. Real-time

systems’

patterns are process-oriented rather than object- or component-oriented.

In this section, I discuss three real-time architectural patterns that are

commonly used:

1. Observe and React This pattern is used when a set of sensors are

routinely monitored and displayed. When the sensors show that some

event has occurred (e.g., an incoming call on a cell phone), the system

reacts by initiating a process to handle that event.

2. Environmental Control This pattern is used when a system includes

sensors, which provide information about the environment and actuators

that can change

the environment. In response to environmental changes detected by the

sensor,

control signals are sent to the system actuators.

3. Process Pipeline This pattern is used when data has to be transformed

from one representation to another before it can be processed. The

transformation is

implemented as a sequence of processing steps, which may be carried out

con-

currently. This allows for very fast data processing, because a separate

core or

processor can execute each transformation.

These patterns can of course be combined, and you will often see more

than one of

them in a single system. For example, when the Environmental Control

pattern is

used, it is very common for the actuators to be monitored using the

Observe and React pattern. In the event of an actuator failure, the system

may react by displaying a warning message, shutting down the actuator,

switching in a backup system, and so forth.

The patterns that I cover are architectural patterns that describe the

overall structure of an embedded system. Douglass (Douglass 2002)

describes lower-level, real-

time design patterns that support more detailed design decision making.

These

patterns include design patterns for execution control, communications,

resource

allocation, and safety and reliability.

21.2 Architectural patterns for real-time software 621

Name

Observe and React

Description

The input values of a set of sensors of the same types are

collected and analyzed. These values are displayed in some

way. If the sensor values indicate that some exceptional

condition has arisen, then actions are initiated to draw the

operator’s attention to that value and, if necessary, take actions

in response to the exceptional value.

Stimuli

Values from sensors attached to the system.

Responses

Outputs to display, alarm triggers, signals to reacting systems.

Processes

Observer, Analysis, Display, Alarm, Reactor.

Figure 21.6 The

Observe and React

Used in

Monitoring systems, alarm systems.

pattern

Display

Sensors

Sensor

Display

values

values

Observer

Analysis

Display

process

process

process

Alarm

Reactor process

process

Figure 21.7 The

Observe and React

Other equipment

process structure

Alarm

These architectural patterns should be the starting point for an embedded

systems

design; however, they are not design templates. If you use them as such,

you will

probably end up with an inefficient process architecture. You have to

optimize the

process structure to ensure that you do not have too many processes. You

also should ensure that there is a clear correspondence between the

processes and the sensors

and actuators in the system.

21.2.1 Observe and react

Monitoring systems are an important class of embedded real-time systems.

A moni-

toring system examines its environment through a set of sensors and

usually displays the state of the environment in some way. This could be

on a built-in screen, on

special-purpose instrument displays, or on a remote display. If the system

detects

some exceptional event or sensor state, the monitoring system takes some

action.

622 Chapter 21 Real-time software engineering

Door sensor

process

Control panel

process

Testing process

Movement

detector process

System

Console display

Voltage monitor

controller

process

process

Power management

Window sensor

process

process

Figure 21.8 The process

Audible alarm

Lighting control

External alert

structure of a burglar

process

process

process

alarm system

This often involves raising an alarm to draw an operator’s attention to the

event.

Sometimes the system may initiate some other preventative action, such as

shutting

down the system to preserve it from damage.

The Observe and React pattern (Figures 21.6 and 21.7) is commonly used

in

monitoring systems. The values of sensors are observed, and the system

initiates

actions that depend on these sensor values. Monitoring systems may be

composed of

several instantiations of the Observe and React pattern, one for each type

of sensor in the system. Depending on the system requirements, you may

then optimize the

design by combining processes (e.g., you may use a single display process

to display the information from all of the different types of sensor).

As an example of the use of this pattern, consider the design of a burglar

alarm

system to be installed in an office building:

A software system is to be implemented as part of a burglar alarm system for

commercial buildings. This uses several different types of sensors. These sen-

sors include movement detectors in individual rooms, door sensors that detect

corridor doors opening, and window sensors on ground-floor windows that

can detect when a window has been opened.

When a sensor detects the presence of an intruder, the system automatically

calls the local police and, using a voice synthesizer, reports the location of the

alarm.

It switches on lights in the rooms around the active sensor and sets off an

audible alarm. The sensor system is normally powered by mains power but is

equipped

with a battery backup. Power loss is detected using a separate power circuit

monitor that monitors the mains voltage. If a voltage drop is detected, the

system assumes that intruders have interrupted the power supply, so an alarm is

raised.

A process architecture for the alarm system is shown in Figure 21.8. The

arrows

represent signals sent from one process to another. This system is a “soft”

real-time system that does not have stringent timing requirements. The

sensors only need to detect

21.2 Architectural patterns for real-time software 623

Name

Environmental Control

Description

The system analyzes information from a set of sensors that

collect data from the system’s environment. Further

information may also be collected on the state of the

actuators that are connected to the system. Based on the

data from the sensors and actuators, control signals are

sent to the actuators, which then cause changes to the

system’s environment. Information about the sensor values

and the state of the actuators may be displayed.

Stimuli

Values from sensors attached to the system and the state

of the system actuators.

Responses

Control signals to actuators display information.

Processes

Monitor, Control, Display, Actuator driver, Actuator monitor.

Figure 21.9 The

Environmental

Used in

Control systems.

Control pattern

Display

Sensors

Sensor

Display

values

values

Monitor

Control

Display

process

process

process

Control

Actuator

instructions

state

Actuator

Actuator monitor

driver process

process

Figure 21.10

The Environmental

Control process

structure

Actuator

the presence of people rather than high-speed events, so they only need to

be polled 2 or 3 times per second. I cover the timing requirements for this

system in Section 21.3.

I have already introduced the stimuli and responses in this alarm system

in Figure

21.1. These responses are used as a starting point for the system design.

The Observe and React pattern is used in this design. There are observer

processes associated with each type of sensor and reactor processes for

each type of reaction. A single analysis process checks the data from all of

the sensors. The display processes in the pattern are combined into a

single display process.

21.2.2 Environmental Control

The most widespread use of real-time embedded software is in control

systems. In

these systems, the software controls the operation of equipment, based on

stimuli

624 Chapter 21 Real-time software engineering

Pedal pressure sensor

Brake 1

Brake 2

Pedal

monitor

Brake 1

Brake 2

process

process

Analysis

process

Brake 3

Brake 4

process

process

Wheel

monitor

Figure 21.11 Control

system architecture

Brake 3

Brake 4

for an anti-skid

braking system

Wheel sensors

from the equipment’s environment. For example, an anti-skid braking

system in a

car monitors the car’s wheels and brake system (the system’s

environment). It looks

for signs that the wheels are skidding when brake pressure is applied. If

this is the case, the system adjusts the brake pressure to stop the wheels

locking and reduce the likelihood of a skid.

Control systems may make use of the Environmental Control pattern,

which is a

general control pattern that includes sensor and actuator processes. This

pattern is described in Figure 21.9, with the process architecture shown in

Figure 21.10. A

variant of this pattern leaves out the display process. This variant is used

in situations where user intervention is not required or where the rate of

control is so high that a display would not be meaningful.

This pattern can be the basis for a control system design with an

instantiation

of the Environmental Control pattern for each actuator (or actuator type)

being

controlled. You then optimize the design to reduce the number of

processes. For

example, you may combine actuator monitoring and actuator control

processes,

or you may have a single monitoring and control process for several

actuators.

The optimizations that you choose depend on the timing requirements.

You may

need to monitor sensors more frequently than you send control signals, in

which

case it may be impractical to combine control and monitoring processes.

There

may also be direct feedback between the actuator control and the actuator

moni-

toring process. This allows fine-grain control decisions to be made by the

actua-

tor control process.

You can see how this pattern is used in Figure 21.11, which shows an

example of a

controller for a car braking system. The starting point for the design is

associating an instance of the pattern with each actuator type in the

system. In this case, there are four actuators, with each controlling the

brake on one wheel. The individual sensor processes are combined into a

single wheel-monitoring process that monitors the sensors on all

21.2 Architectural patterns for real-time software 625

Name

Process Pipeline

Description

A pipeline of processes is set up with data moving in sequence

from one end of the pipeline to another. The processes are often

linked by synchronized buffers to allow the producer and

consumer processes to run at different speeds. The culmination

of a pipeline may be display or data storage, or the pipeline may

terminate in an actuator.

Stimuli

Input values from the environment or some other process

Responses

Output values to the environment or a shared buffer

Processes

Producer, Buffer, Consumer

Figure 21.12

The Process

Used in

Data acquisition systems, multi-media systems

Pipeline pattern

Produced

Consumed

data

data

Figure 21.13 Process

Producer

Buffer

Consumer

...

Pipeline process

process

process

process

structure

wheels. This monitors the state of each wheel to check if the wheel is

turning or locked.

A separate process monitors the pressure on the brake pedal exerted by

the car driver.

The system includes an anti-skid feature, which is triggered if the sensors

indicate that a wheel is locked when the brake has been applied. This

means that there is

insufficient friction between the road and the tire; in other words, the car

is skidding.

If the wheel is locked, the driver cannot steer that wheel. To counteract

this effect, the system sends a rapid sequence of on/off signals to the brake

on that wheel, which allows the wheel to turn and control to be regained.

The Wheel monitor process monitors whether or not each wheel is

turning. If a

wheel is skidding (not turning), it informs the Analysis process. This then

signals the processes associated with the wheels that are skidding to

initiate anti-skid braking.

21.2.3 Process pipeline

Many real-time systems are concerned with collecting analog data from

the system’s

environment. They then digitize that data for analysis and processing by

the system.

The system may also convert digital data to analog data, which it then

sends to its

environment. For example, a software radio accepts incoming packets of

digital data

representing the radio transmission and transforms the data into a sound

signal that people can listen to.

The data processing involved in many of these systems has to be carried

out very

quickly. Otherwise, incoming data may be lost and outgoing signals may

be broken

up because essential information is missing. The Process Pipeline pattern

makes

this rapid processing possible by breaking down the required data

processing into a

sequence of separate transformations. Each of these transformations is

implemented

626 Chapter 21 Real-time software engineering

Neutron flux sensors

Storage

Sensor

Processed

identifier and

flux level

flux value

A-D

Raw data

Flux

Flux value

convertor

buffer

processing

buffer

Display

Figure 21.14 Neutron

flux data acquisition

by an independent process. This architecture is efficient for systems that

use multiple processors or multicore processors. Each process in the

pipeline can be associ-

ated with a separate processor or core, so that the processing steps can be

carried

out in parallel.

Figure 21.12 is a brief description of the data pipeline pattern, and Figure

21.13

shows the process architecture for this pattern. Notice that the processes

involved

produce and consume information. The processes exchange information

using

synchronized buffers, as I explained in Section 21.1. Producer and

consumer pro-

cesses can thereby operate at different speeds without data losses.

An example of a system that may use a process pipeline is a high-speed

data

acquisition system. Data acquisition systems collect data from sensors for

subse-

quent processing and analysis. These systems are used in situations where

the sen-

sors are collecting large volumes of data from the system’s environment

and it isn’t possible or necessary to process that data in real time. Rather,

it is collected and stored for later analysis. Data acquisition systems are

often used in scientific experiments and process control systems where

physical processes, such as chemical reac-

tions, are very rapid. In these systems, the sensors may be generating data

very

quickly, and the data acquisition system has to ensure that a sensor

reading is col-

lected before the sensor value changes.

Figure 21.14 is a simplified model of a data acquisition system that might

be part

of the control software in a nuclear reactor. This system collects data from

sensors monitoring the neutron flux (the density of neutrons) in the

reactor. The sensor data is placed in a buffer from which it is extracted

and processed. The average flux level is displayed on an operator’s display

and stored for future processing.

21.3 Timing analysis

As I discussed in the introduction to this chapter, the correctness of a real-

time system depends not just on the correctness of its outputs but also on

the time at which these outputs were produced. Therefore, timing analysis

is an important activity in

the embedded, real-time software development process. In such an

analysis, you cal-

culate how often each process in the system must be executed to ensure

that all inputs

21.3 Timing analysis 627

are processed and all system responses are produced in a timely way. The

results of

the timing analysis are used to decide how frequently each process should

execute

and how these processes should be scheduled by the real-time operating

system.

Timing analysis for real-time systems is particularly difficult when the

system

has to deal with a mixture of periodic and aperiodic stimuli and responses.

Because aperiodic stimuli are unpredictable, you have to make

assumptions about

the probability of these stimuli occurring and therefore requiring service

at any

particular time. These assumptions may be incorrect, and system

performance

after delivery may not be adequate. Cooling’s book (Cooling 2003)

discusses

techniques for real-time system performance analysis that takes aperiodic

events

into account.

As computers have become faster, it has become possible in many systems

to

design using only periodic stimuli. When processors were slow, aperiodic

stimuli

had to be used to ensure that critical events were processed before their

deadline,

as delays in processing usually involved some loss to the system. For

example,

the failure of a power supply in an embedded system may mean that the

system

has to shut down attached equipment in a controlled way, within a very

short

time (say 50 milliseconds). This could be implemented as a “power fail”

inter-

rupt. However, it can also be implemented using a periodic process that

runs

frequently and checks the power. As long as the time between process

invoca-

tions is short, there is still time to perform a controlled shutdown of the

system

before the lack of power causes damage. For this reason, I only discuss

timing

issues for periodic processes.

When you are analyzing the timing requirements of embedded real-time

systems and

designing systems to meet these requirements, you have to consider three

key factors: 1. Deadlines The times by which stimuli must be processed and

some response produced by the system. If the system does not meet a

deadline, then, if it is a

hard real-time system, this is a system failure; in a soft real-time system, it

results in degraded system service.

2. Frequency The number of times per second that a process must execute

so that you are confident that it can always meet its deadlines.

3. Execution time The time required to process a stimulus and produce a

response.

Execution time is not always the same because of the conditional

execution of

code, delays waiting for other processes, and so on. Therefore, you may

have to

consider both the average execution time of a process and the worst-case

execu-

tion time for that process. The worst-case execution time is the maximum

time

that the process takes to execute. In a hard real-time system, you may

have to

make assumptions based on the worst-case execution time to ensure that

dead-

lines are not missed. In soft real-time systems, you can base your

calculations on

the average execution time.

To continue the example of a power supply failure, let’s calculate the

worst-

case execution time for a process that switches equipment power from

mains

628 Chapter 21 Real-time software engineering

Voltage

Normal voltage

level

R1

R2

R3

R4

Critical voltage

level

Power switcher

Battery startup

4ms

8ms

12ms

16ms

20ms

24ms

28ms

32ms

36ms

40ms

Time

Figure 21.15

power to a battery backup. Figure 21.15 presents a timeline showing the

events in

Power failure timing

analysis

the system:

1. Assume that, after a mains power failure event, it takes 50 milliseconds

(ms)

for the supplied voltage to drop to a level where the equipment may be

dam-

aged. The battery backup must therefore be activated and in operation

within

50 ms. Usually, you allow for a margin of error, so you should set a

shorter

deadline of 40 ms because of physical variations in the equipment. This

means that all equipment must be running on the battery backup power

sup-

ply within 40 ms.

2. However, the battery backup system cannot be instantaneously

activated. It

takes 16 ms from starting the backup power supply to the supply being

fully

operational. This means that the time available to detect the power failure

and

start the battery backup system is 24 ms.

3. There is a process that is scheduled to run 250 times per second, that is,

every 4 ms.

This process assumes that there is a power supply problem if a significant

drop

in voltage occurs between readings and is sustained for three readings.

This time

is allowed so that temporary fluctuations do not cause a switch to the

battery

backup system.

4. In the above timeline, the power fails immediately after a reading has

been

taken. Therefore, reading R1 is the start reading for the power fail check.

The

voltage continues to drop for readings R2–R4, so a power failure is

assumed.

This is the worst possible case, where a power failure event occurs

immediately

after a sensor check, so 16 ms have elapsed since that event.

5. At this stage, the process that switches to the battery backup is started.

Because the battery backup takes 16 ms to become operational, the worst-

case execution

time for this process is 8 ms, so that the 40 ms deadline can be reached.

21.3 Timing analysis 629

Stimulus/Response

Timing requirements

Audible alarm

The audible alarm should be switched on within half a

second of an alarm being raised by a sensor.

Communications

The call to the police should be started within

2 seconds of an alarm being raised by a sensor.

Door alarm

Each door alarm should be polled twice per second.

Lights switch

The lights should be switched on within half a second

of an alarm being raised by a sensor.

Movement detector

Each movement detector should be polled twice per

second.

Power failure

The switch to backup power must be completed within

a deadline of 50 ms.

Voice synthesizer

A synthesized message should be available within

Figure 21.16

2 seconds of an alarm being raised by a sensor.

Timing requirements

for the burglar

Window alarm

Each window alarm should be polled twice per second.

alarm system

The starting point for timing analysis in a real-time system is the timing

require-

ments, which should set out the deadlines for each required response in

the system.

Figure 21.16 shows possible timing requirements for the office building

burglar

alarm system discussed in Section 21.2.1. To simplify this example, let us

ignore

stimuli generated by system testing procedures and external signals to

reset the system in the event of a false alarm. This means there are only

two types of stimulus

processed by the system:

1. Power failure is detected by observing a voltage drop of more than

20%. The

required response is to switch the circuit to backup power by signaling an

elec-

tronic power-switching device that switches the mains power to battery

backup.

2. Intruder alarm is a stimulus generated by one of the system sensors. The

response to this stimulus is to compute the room number of the active

sensor, set

up a call to the police, initiate the voice synthesizer to manage the call,

and

switch on the audible intruder alarm and building lights in the area.

As shown in Figure 21.16, you should list the timing constraints for each

class of

sensor separately, even when (as in this case) they are the same. By

considering

them separately, you leave scope for future change and make it easier to

compute the number of times the controlling process has to be executed

each second.

Allocating the system functions to concurrent processes is the next design

stage.

Four types of sensors must be polled periodically, each with an associated

process:

the voltage sensor, door sensors, window sensors, and movement

detectors.

Normally, the processes associated with the sensor will execute very

quickly as all

630 Chapter 21 Real-time software engineering

50 Hz (0.5 ms)

50 Hz (0.5 ms)

B

Door sensor

process

Control panel

process

Testing process

50 Hz (1 ms)

Movement

250 Hz

detector process

(1 ms)

50 Hz (1 ms)

System

Console display

250 Hz (0.5 ms)

controller

process

Voltage monitor

process

R (20 ms)

50 Hz (0.5 ms)

Power management

process

Window sensor

process

R (5 ms)

R (5 ms)

R (10 ms)

Audible alarm

Lighting control

External alert

Figure 21.17

process

process

process

Alarm process timing

they are doing is checking whether or not a sensor has changed its status

(e.g., from off to on). It is reasonable to assume that the execution time to

check and assess the state of one sensor is less than 1 millisecond.

To ensure that you meet the deadlines defined by the timing requirements,

you

then have to decide how frequently the related processes have to run and

how many

sensors should be examined during each execution of the process. There

are obvious

trade-offs here between frequency and execution time:

1. The deadline for detecting a change of state is 0.25 second, which

means that

each sensor has to be checked 4 times per second. If you examine one

sensor

during each process execution, then if there are N sensors of a particular

type,

you must schedule the process 4N times per second to ensure that all

sensors are

checked within the deadline.

2. If you examine four sensors, say, during each process execution, then

the execu-

tion time is increased to about 4 ms, but you need only run the process N

times/

second to meet the timing requirement.

In this case, because the system requirements define actions when two or

more

sensors are positive, the best strategy is to examine sensors in groups, with

groups based on the physical proximity of the sensors. If an intruder has

entered the building, then it will probably be adjacent sensors that are

positive.

When you have completed the timing analysis, you may then annotate the

process

model with information about frequency of execution and their expected

execution

time (see Figure 21.17). Here, periodic processes are annotated with their

frequency, processes that are started in response to a stimulus are

annotated with R, and the testing process is a background process,

annotated with B. This background process

21.4 Real-time operating systems 631

only runs when processor time is available. In general, it is simpler to

design a system so that there are a small number of process frequencies.

The execution times

represent the required worst-case execution times of the processes.

The final step in the design process is to design a scheduling system that

will

ensure that a process will always be scheduled to meet its deadlines. You

can only do this if you know the scheduling approaches that are supported

by the real-time operating system (OS) used (Burns and Wellings 2009).

The scheduler in the real-time

OS allocates a process to a processor for a given amount of time. The time

can be

fixed, or it may vary depending on the priority of the process.

In allocating process priorities, you have to consider the deadlines of each

process so that processes with short deadlines receive processor time to

meet these deadlines. For example, the voltage monitor process in the

burglar alarm needs to be scheduled so that voltage drops can be detected

and a switch made to backup power before the system

fails. This should therefore have a higher priority than the processes that

check sensor values, as these have fairly relaxed deadlines compared to

their expected execution time.

21.4 Real-time operating systems

The execution platform for most application systems is an operating

system that

manages shared resources and provides features such as a file system and

runtime

process management. However, the extensive functionality in a

conventional operat-

ing system takes up a great deal of space and slows down the operation of

programs.

Furthermore, the process management features in the system may not be

designed to

allow fine-grain control over the scheduling of processes.

For these reasons, standard operating systems, such as Linux and

Windows, are not

normally used as the execution platform for real-time systems. Very simple

embedded

systems may be implemented as “bare metal” systems. The systems

provide their own

execution support and so include system startup and shutdown, process

and resource

management, and process scheduling. More commonly, however,

embedded applica-

tions are built on top of a real-time operating system (RTOS), which is an

efficient operating system that offers the features needed by real-time

systems. Examples of

RTOS are Windows Embedded Compact, VxWorks, and RTLinux.

A real-time operating system manages processes and resource allocation

for a

real-time system. It starts and stops processes so that stimuli can be

handled, and it allocates memory and processor resources. The

components of an RTOS (Figure

21.18) depend on the size and complexity of the real-time system being

developed.

For all except the simplest systems, they usually include:

1. A real-time clock, which provides the information required to schedule

pro-

cesses periodically.

2. If interrupts are supported, an interrupt handler, which manages

aperiodic

requests for service.

632 Chapter 21 Real-time software engineering

Scheduling

information

Real-time

Interrupt

Scheduler

clock

handler

Process resource

requirements

Processes

Available

Resource

awaiting

resource

manager

resources

list

Ready

Released

processes

resources

Ready

Processor

Dispatcher

list

list

Figure 21.18

Components of a

real-time operating

system

Executing process

3. A scheduler, which is responsible for examining the processes that can

be exe-

cuted and for choosing one of these processes for execution.

4. A resource manager, which allocates appropriate memory and processor

resources to processes that have been scheduled for execution.

5. A dispatcher, which is responsible for starting the execution of

processes.

Real-time operating systems for large systems, such as process control or

telecom-

munication systems, may have additional facilities, namely, disk storage

management, fault management facilities that detect and report system

faults, and a configuration manager that supports the dynamic

reconfiguration of real-time applications.

21.4.1 Process management

Real-time systems have to handle external events quickly and, in some

cases, meet

deadlines for processing these events. The event-handling processes must

therefore

be scheduled for execution in time to detect the event. They must also be

allocated

sufficient processor resources to meet their deadline. The process manager

in an

RTOS is responsible for choosing processes for execution, allocating

processor and

memory resources, and starting and stopping process execution on a

processor.

21.4 Real-time operating systems 633

Process queue

Memory map

Processor list

Ready list

Scheduler

Resource manager

Dispatcher

Figure 21.19 RTOS

Choose process

Allocate memory

Start execution on an

actions required

for execution

and processor

available processor

to start a process

The process manager has to manage processes with different priorities. For

some stimuli, such as those associated with certain exceptional events, it is

essen-

tial that their processing should be completed within the specified time

limits.

Other processes may be safely delayed if a more critical process requires

service.

Consequently, the RTOS has to be able to manage at least two priority

levels for

system processes:

1. Clock level This level of priority is allocated to periodic processes.

2. Interrupt level This is the highest priority level. It is allocated to

processes that need a very fast response. One of these processes will be the

real-time clock

process. This process is not required if interrupts are not supported in the

system.

A further priority level may be allocated to background processes (such as

a self-

checking process) that do not need to meet real-time deadlines. These

processes are

scheduled for execution when processor capacity is available.

Periodic processes must be executed at specified time intervals for data

acquisition and actuator control. In most real-time systems, there will be

several types of periodic process. Using the timing requirements specified

in the application program, the RTOS

arranges the execution of periodic processes so that they can all meet their

deadlines.

The actions taken by the operating system for periodic process

management are

shown in Figure 21.19. The scheduler examines the list of periodic

processes and

selects a process to be executed. The choice depends on the process

priority, the

process periods, the expected execution times, and the deadlines of the

ready pro-

cesses. Sometimes two processes with different deadlines should be

executed at the

same clock tick. In such a situation, one process must be delayed.

Normally, the

system will choose to delay the process with the longest deadline.

Processes that have to respond quickly to asynchronous events may be

interrupt-

driven. The computer’s interrupt mechanism causes control to transfer to a

prede-

termined memory location. This location contains an instruction to jump

to a

simple and fast interrupt service routine. The service routine disables

further interrupts to avoid being interrupted itself. It then discovers the

cause of the interrupt and initiates, with a high priority, a process to

handle the stimulus causing the

interrupt. In some high-speed data acquisition systems, the interrupt

handler saves

the data that the interrupt signaled was available in a buffer for later

processing.

Interrupts are then enabled again, and control is returned to the operating

system.

634 Chapter 21 Real-time software engineering

At any one time several processes, all with different priorities, could be

executed.

The process scheduler implements system-scheduling policies that

determine the

order of process execution. There are two commonly used scheduling

strategies:

1. Nonpreemptive scheduling After a process has been scheduled for

execution, it runs to completion or until it is blocked for some reason,

such as waiting for

input. This can cause problems if there are processes with different

priorities

and a high-priority process has to wait for a low-priority process to finish.

2. Preemptive scheduling The execution of an executing process may be

stopped if a higher-priority process requires service. The higher-priority

process preempts

the execution of the lower-priority process and is allocated to a processor.

Within these strategies, different scheduling algorithms have been

developed.

These include round-robin scheduling, where each process is executed in

turn; rate

monotonic scheduling, where the process with the shortest period (highest

fre-

quency) is given priority; and shortest deadline first scheduling, where the

process in the queue with the shortest deadline is scheduled (Burns and

Wellings 2009).

Information about the process to be executed is passed to the resource

manager.

The resource manager allocates memory and, in a multiprocessor system,

also adds

a processor to this process. The process is then placed on the “ready list,”

a list of processes that are ready for execution. When a processor finishes

executing a process and becomes available, the dispatcher is invoked. It

scans the ready list to find a process that can be executed on the available

processor and starts its execution.

K e y P o i n t s

An embedded software system is part of a hardware/software system

that reacts to events in its environment. The software is “embedded” in

the hardware. Embedded systems are normally real-time systems.

A real-time system is a software system that must respond to events in

real time. System correctness does not just depend on the results it

produces, but also on the time when these results are produced.

Real-time systems are usually implemented as a set of communicating

processes that react to stimuli to produce responses.

State models are an important design representation for embedded real-

time systems. They are used to show how the system reacts to its

environment as events trigger changes of state in the system.

Several standard patterns can be observed in different types of

embedded system. These include a pattern for monitoring the system’s

environment for adverse events, a pattern for actuator control, and a data-

processing pattern.

Chapter 21 Exercises 635

Designers of real-time systems have to do a timing analysis, which is

driven by the deadlines for processing and responding to stimuli. They

have to decide how often each process in the system should run and the

expected and worst-case execution time for processes.

A real-time operating system is responsible for process and resource

management. It always includes a scheduler, which is the component

responsible for deciding which process should be scheduled for execution.

F u R T h E R R E A d i n g

Real-time Systems and Programming Language: Ada, Real-time Java and C/

Real-time POSIX, 4th ed.

An excellent and comprehensive text that provides broad coverage of all

aspects of real-time systems. (A. Burns and A. Wellings, Addison-Wesley,

2009).

“Trends in Embedded Software Engineering.” This article suggests that

model-driven development (as discussed in Chapter 5 of this book) will

become an important approach to embedded systems development. This is

part of a special issue on embedded systems, and other articles, such as

the one by Ebert and Jones, are also useful reading. ( IEEE Software, 26

(3), May–June 2009). http://dx.

doi.org/10.1109/MS.2009.80

Real-time systems: Design Principles for Distributed Embedded Applications,

2nd ed. This is a comprehensive textbook on modern real-time systems

that may be distributed and mobile systems. The author focuses on hard

real-time systems and covers important topics such as Internet

connectivity and power management. (H. Kopetz, Springer, 2013).

W E b s i T E

PowerPoint slides for this chapter:

www.pearsonglobaleditions.com/Sommerville

Links to supporting videos:

http://software-engineering-book.com/videos/systems-engineering/

E x E R C i s E s

21.1. Explain why responsiveness in real time is the critical difference

between embedded systems and other software systems.

21.2. Identify possible stimuli and the expected responses for an

embedded system that controls a home refrigerator or a domestic washing

machine.

21.3. Using the state-based approach to modeling, as discussed in Section

21.1.1, model the operation of the embedded software for a voicemail

system that is included in a landline phone.

636 Chapter 21 Real-time software engineering

Train protection system

The system acquires information on the speed limit of a segment

from a trackside transmitter, which continually broadcasts the segment

identifier and its speed limit. The same transmitter also broadcasts

information on the status of the signal controlling that track segment. The

time required to broadcast track segment and signal information is 50 ms.

The train can receive information from the trackside transmitter

when it is within 10 m of a transmitter.

The maximum train speed is 180 kph.

Sensors on the train provide information about the current train

speed (updated every 250 ms) and the train brake status (updated

every 100 ms).

If the train speed exceeds the current segment speed limit by

more than 5 kph, a warning is sounded in the driver’s cabin. If the

train speed exceeds the current segment speed limit by more than 10 kph,

the train’s brakes are automatically applied until the speed falls to the

segment speed limit. Train brakes should be applied within 100 ms of the

time when the excessive train speed has been detected.

If the train enters a track segment that is signaled with a red

light, the train protection system applies the train brakes and reduces

the speed to zero. Train brakes should be applied within 100 ms of the

time when the red light signal is received.

The system continually updates a status display in the driver’s

cabin.

Figure 21.20

Requirements for

a train protection

system

This should display the number of recorded messages on an LED display

and should allow the user to dial-in and listen to the recorded messages.

21.4. What are the commonly used architectural patterns in real-time

systems and when are they used?

21.5. Show how the Environmental Control pattern could be used as the

basis of the design of a system to control the temperature in a greenhouse.

The temperature should be between 10

and 30 degrees Celsius. If it falls below 10 degrees, the heating system

should be switched on; if it goes above 30, the windows should be

automatically opened.

21.6. Design a process architecture for an environmental monitoring

system that collects data from a set of air quality sensors situated around a

city. There are 5000 sensors organized into 100 neighborhoods. Each

sensor must be interrogated four times per second. When more than 30%

of the sensors in a particular neighborhood indicate that the air quality is

below an acceptable level, local warning lights are activated. All sensors

return the readings to a central computer, which generates reports every

15 minutes on the air quality in the city.

21.7. A train protection system automatically applies the brakes of a train

if the speed limit for a segment of track is exceeded or if the train enters a

track segment that is currently signaled with a red light (i.e., the segment

should not be entered). Details are shown in Figure 21.20.

Identify the stimuli that must be processed by the on-board train control

system and the associated responses to these stimuli.

Chapter 21 References 637

21.8. Suggest a possible process architecture for this system.

21.9. If a periodic process in the on-board train protection system is used

to collect data from the trackside transmitter, how often must it be

scheduled to ensure that the system is guaranteed to collect information

from the transmitter? Explain how you arrived at your answer.

21.10. With the help of examples, define what a real-time operating

system is. Explain how it is different from a conventional operating

system. What are the components included in real-time operating systems

and what are their responsibilities?

R E F E R E n C E s

Berry, G. 1989. “Real-Time Programming: Special-Purpose or General-

Purpose Languages.” In Information Processing, edited by G. Ritter, 89:11–

17. Amsterdam: Elsevier Science Publishers.

Bruno, E. J., and G. Bollella. 2009. Real-Time Java Programming: With Java

RTS. Boston: Prentice-Hall.

Burns, A., and A. Wellings. 2009. Real-Time Systems and Programming

Languages: Ada, Real-Time Java and C/Real-Time POSIX. Boston: Addison-

Wesley.

Cooling, J. 2003. Software Engineering for Real-Time Systems. Harlow, UK:

Addison-Wesley.

Douglass, B. P. 1999. Real-Time UML: Developing Efficient Objects for

Embedded Systems, 2nd ed.

Boston: Addison-Wesley.

––––––. 2002. Real-Time Design Patterns: Robust Scalable Architecture for

Real-Time Systems.

Boston: Addison-Wesley.

Ebert, C., and C. Jones. 2009. “Embedded Software: Facts, Figures and

Future.” IEEE Computer 26 (3): 42–52. doi:10.1109/MC.2009.118.

Harel, D. 1987. “Statecharts: A Visual Formalism for Complex Systems.”

Sci. Comput. Programming 8

(3): 231–274. doi:10.1016/0167-6423(87)90035-9.

––––––. 1988. “On Visual Formalisms.” Comm. ACM 31 (5): 514–530.

doi:10.1145/42411.42414.

Lee, E A. 2002. “Embedded Software.” In Advances in Computers, edited by

M. Zelkowitz. Vol. 56.

London: Academic Press.

Silberschaltz, A., P. B. Galvin, and G. Gagne. 2013. Operating System

Concepts, 9th ed. New York: John Wiley & Sons.

Stallings, W. 2014. Operating Systems: Internals and Design Principles, 8th ed.

Boston: Prentice-Hall.

This page intentionally left blank

PART 4 Software

Management

It is sometimes suggested that the key difference between software engi-

neering and other types of programming is that software engineering is

a managed process. By this, I mean that the software development takes

place within an organization and is subject to a range of schedule, budget

and organizational constraints. I introduce a range of management topics

in

this part of the book with a focus on technical management issues rather

than ‘softer’ management issues such as people management, or the more

strategic management of enterprise systems.

Chapters 22 and 23 focus on the essential project management activities,

planning, risk management and people management. Chapter 22 intro-

duces software project management and its first major section is con-

cerned with risk management where managers identify what might go

wrong and plan what they might do about it. This chapter also includes

sections on people management and team working.

Chapter 23 covers project planning and estimation. I introduce bar charts

as fundamental planning tools and explain why plan-driven development

will remain an important development approach, in spite of the success

of agile methods. I also discuss issues that influence the price charged

for a system and techniques of software cost estimation. I use the

COCOMO family of cost models to describe algorithmic cost modeling

and explain the benefits and disadvantages of algorithmic approaches.

Chapters 24 explains the basics of software quality management, as

practised in large projects. Quality management is concerned with pro-

cesses and techniques for ensuring and improving the quality of soft-

ware. I discuss the importance of standards in quality management, the

use of reviews and inspections in the quality assurance process. The final

section of this chapter covers software measurement and I discuss the

benefits and problems in using metrics and software data analytics in

quality management.

Finally, Chapter 25 discusses configuration management, a critical issue

for all large systems. However, the need for configuration management is

not always obvious to students who have only been concerned with per-

sonal software development, so I describe the various aspects of this

topic here, including version management, system building, change man-

agement and release management. I explain why continuous integration

or daily system building is important. An important change in this edition

is the inclusion of new material on distributed version management sys-

tems, such as Git, which are being increasingly used to support software

engineering by distributed teams.

22

Project management

Objectives

The objective of this chapter is to introduce software project management

and two important management activities, namely, risk management and

people management. When you have read the chapter you will:

know the principal tasks of software project managers;

have been introduced to the notion of risk management and some of

the risks that can arise in software projects;

understand factors that influence personal motivation and what these

might mean for software project managers;

understand key issues that influence team working, such as team

composition, organization, and communication.

Contents

22.1 Risk management

22.2 Managing people

22.3 Teamwork

642 Chapter 22 Project management

Software project management is an essential part of software engineering.

Projects

need to be managed because professional software engineering is always

subject to

organizational budget and schedule constraints. The project manager’s job

is to ensure that the software project meets and overcomes these

constraints as well as delivering high-quality software. Good management

cannot guarantee project success. However,

bad management usually results in project failure: The software may be

delivered late, cost more than originally estimated, or fail to meet the

expectations of customers.

The success criteria for project management obviously vary from project

to pro-

ject, but, for most projects, important goals are:

to deliver the software to the customer at the agreed time;

to keep overall costs within budget;

to deliver software that meets the customer’s expectations;

to maintain a coherent and well-functioning development team.

These goals are not unique to software engineering but are the goals of all

engineering projects. However, software engineering is different from

other types of engineering in a number of ways that make software

management particularly challenging. Some of these differences are:

1. The product is intangible A manager of a shipbuilding or a civil

engineering project can see the product being developed. If a schedule

slips, the effect on the

product is visible—parts of the structure are obviously unfinished.

Software is

intangible. It cannot be seen or touched. Software project managers

cannot see

progress by looking at the artifact that is being constructed. Rather, they

rely on

others to produce evidence that they can use to review the progress of the

work.

2. Large software projects are often “one-off” projects Every large software

development project is unique because every environment where software

is

developed is, in some ways, different from all others. Even managers who

have

a large body of previous experience may find it difficult to anticipate

problems.

Furthermore, rapid technological changes in computers and

communications

can make experience obsolete. Lessons learned from previous projects may

not

be readily transferable to new projects.

3. Software processes are variable and organization-specific The engineering

process for some types of system, such as bridges and buildings, is well

understood. However, different companies use quite different software

development processes. We cannot

reliably predict when a particular software process is likely to lead to

development problems. This is especially true when the software project is

part of a wider systems engineering project or when completely new

software is being developed.

Because of these issues, it is not surprising that some software projects are

late,

overbudget, and behind schedule. Software systems are often new, very

complex,

and technically innovative. Schedule and cost overruns are also common

in other

Chapter 22 Project management 643

engineering projects, such as new transport systems, that are complex and

innova-

tive. Given the difficulties involved, it is perhaps remarkable that so many

software projects are delivered on time and to budget.

It is impossible to write a standard job description for a software project

manager. The job varies tremendously depending on the organization and

the software being developed.

Some of the most important factors that affect how software projects are

managed are: 1. Company size Small companies can operate with informal

management and

team communications and do not need formal policies and management

struc-

tures. They have less management overhead than larger organizations. In

larger

organizations, management hierarchies, formal reporting and budgeting,

and

approval processes must be followed.

2. Software customers If the customer is an internal customer (as is the case

for software product development), then customer communications can be

informal

and there is no need to fit in with the customer’s ways of working. If

custom

software is being developed for an external customer, agreement has to be

reached on more formal communication channels. If the customer is a

govern-

ment agency, the software company must operate according to the

agency’s

policies and procedures, which are likely to be bureaucratic.

3. Software size Small systems can be developed by a small team, which

can get together in the same room to discuss progress and other

management issues. Large

systems usually need multiple development teams that may be

geographically

distributed and in different companies. The project manager has to

coordinate the

activities of these teams and arrange for them to communicate with each

other.

4. Software type If the software being developed is a consumer product,

formal records of project management decisions are unnecessary. On the

other hand, if

a safety-critical system is being developed, all project management

decisions

should be recorded and justified as these may affect the safety of the

system.

5. Organizational culture Some organizations have a culture that is based

on supporting and encouraging individuals, while others are group

focused. Large

organizations are often bureaucratic. Some organizations have a culture of

taking risks, whereas others are risk averse.

6. Software development processes Agile processes typically try to operate

with

“lightweight” management. More formal processes require management

monitoring to ensure that the development team is following the defined

process.

These factors mean that project managers in different organizations may

work in

quite different ways. However, a number of fundamental project

management activ-

ities are common to all organizations:

1. Project planning Project managers are responsible for planning,

estimating, and scheduling project development and assigning people to

tasks. They supervise

644 Chapter 22 Project management

the work to ensure that it is carried out to the required standards, and

they mon-

itor progress to check that the development is on time and within budget.

2. Risk management Project managers have to assess the risks that may

affect a project, monitor these risks, and take action when problems arise.

3. People management Project managers are responsible for managing a

team of people. They have to choose people for their team and establish

ways of working that lead to effective team performance.

4. Reporting Project managers are usually responsible for reporting on the

progress of a project to customers and to the managers of the company

developing the

software. They have to be able to communicate at a range of levels, from

detailed

technical information to management summaries. They have to write

concise,

coherent documents that abstract critical information from detailed

project

reports. They must be able to present this information during progress

reviews.

5. Proposal writing The first stage in a software project may involve writing

a proposal to win a contract to carry out an item of work. The proposal

describes

the objectives of the project and how it will be carried out. It usually

includes

cost and schedule estimates and justifies why the project contract should

be

awarded to a particular organization or team. Proposal writing is a critical

task

as the survival of many software companies depends on having enough

propos-

als accepted and contracts awarded.

Project planning is an important topic in its own right, which I discuss in

Chapter 23. In this chapter, I focus on risk management and people

management.

22.1 Risk management

Risk management is one of the most important jobs for a project manager.

You can think of a risk as something that you’d prefer not to have happen.

Risks may threaten the project, the software that is being developed, or

the organization. Risk management involves anticipating risks that might

affect the project schedule or the quality of the software being developed,

and then taking action to avoid these risks (Hall 1998; Ould 1999).

Risks can be categorized according to type of risk (technical,

organizational,

etc.), as I explain in Section 22.1.1. A complementary classification is to

classify risks according to what these risks affect:

1. Project risks affect the project schedule or resources. An example of a

project

risk is the loss of an experienced system architect. Finding a replacement

archi-

tect with appropriate skills and experience may take a long time;

consequently,

it will take longer to develop the software design than originally planned.

2. Product risks affect the quality or performance of the software being

developed.

An example of a product risk is the failure of a purchased component to

perform

22.1 Risk management 645

as expected. This may affect the overall performance of the system so that

it is

slower than expected.

3. Business risks affect the organization developing or procuring the

software. For example, a competitor introducing a new product is a

business risk. The introduction of a competitive product may mean that

the assumptions made about

sales of existing software products may be unduly optimistic.

Of course, these risk categories overlap. An experienced engineer’s

decision to

leave a project, for example, presents a project risk because the software

delivery schedule will be affected. It inevitably takes time for a new

project member to understand the work that has been done, so he or she

cannot be immediately productive.

Consequently, the delivery of the system may be delayed. The loss of a

team mem-

ber can also be a product risk because a replacement may not be as

experienced and so could make programming errors. Finally, losing a team

member can be a business risk because an experienced engineer’s

reputation may be a critical factor in winning new contracts.

For large projects, you should record the results of the risk analysis in a

risk register along with a consequence analysis. This sets out the

consequences of the risk

for the project, product, and business. Effective risk management makes it

easier to cope with problems and to ensure that these do not lead to

unacceptable budget or

schedule slippage. For small projects, formal risk recording may not be

required, but the project manager should be aware of them.

The specific risks that may affect a project depend on the project and the

organi-

zational environment in which the software is being developed. However,

there are

also common risks that are independent of the type of software being

developed.

These can occur in any software development project. Some examples of

these com-

mon risks are shown in Figure 22.1.

Software risk management is important because of the inherent

uncertainties in

software development. These uncertainties stem from loosely defined

requirements,

requirements changes due to changes in customer needs, difficulties in

estimating the time and resources required for software development, and

differences in individual

skills. You have to anticipate risks, understand their impact on the project,

the product, and the business, and take steps to avoid these risks. You may

need to draw up contingency plans so that, if the risks do occur, you can

take immediate recovery action.

An outline of the process of risk management is presented in Figure 22.2.

It

involves several stages:

1. Risk identification You should identify possible project, product, and

business risks.

2. Risk analysis You should assess the likelihood and consequences of these

risks.

3. Risk planning You should make plans to address the risk, either by

avoiding it or by minimizing its effects on the project.

4. Risk monitoring You should regularly assess the risk and your plans for

risk mitigation and revise these plans when you learn more about the risk.

646 Chapter 22 Project management

Risk

Affects

Description

Staff turnover

Project

Experienced staff will leave the project before it is

finished.

Management change

Project

There will be a change of company management

with different priorities.

Hardware

Project

Hardware that is essential for the project will not

unavailability

be delivered on schedule.

Requirements

Project and product

There will be a larger number of changes to the

change

requirements than anticipated.

Specification delays

Project and product

Specifications of essential interfaces are not

available on schedule.

Size underestimate

Project and product

The size of the system has been underestimated.

Software tool

Product

Software tools that support the project do not

underperformance

perform as anticipated.

Technology change

Business

The underlying technology on which the system is

built is superseded by new technology.

Product competition

Business

A competitive product is marketed before the

system is completed.

Figure 22.1 Examples

of common project,

product, and business

risks

Risk

Risk

Risk

Risk

identification

analysis

planning

monitoring

Risk avoidance

List of potential

Prioritized risk

and contingency

Risk

risks

list

plans

assessment

Figure 22.2 The risk

management process

For large projects, you should document the outcomes of the risk

management

process in a risk management plan. This should include a discussion of the

risks

faced by the project, an analysis of these risks, and information on how

you plan to manage the risk if it seems likely to be a problem.

The risk management process is an iterative process that continues

throughout

a project. Once you have drawn up an initial risk management plan, you

monitor

the situation to detect emerging risks. As more information about the risks

becomes

22.1 Risk management 647

available, you have to re-analyze the risks and decide if the risk priority

has

changed. You may then have to change your plans for risk avoidance and

contin-

gency management.

Risk management in agile development is less formal. The same

fundamental

activities should still be followed and risks discussed, although these may

not be

formally documented. Agile development reduces some risks, such as risks

from

requirements changes. However, agile development also has a downside.

Because of

its reliance on people, staff turnover can have significant effects on the

project, product, and business. Because of the lack of formal

documentation and its reliance on

informal communications, it is very hard to maintain continuity and

momentum if

key people leave the project.

22.1.1 Risk identification

Risk identification is the first stage of the risk management process. It is

concerned with identifying the risks that could pose a major threat to the

software engineering process, the software being developed, or the

development organization. Risk identification may be a team process in

which a team gets together to brainstorm possible

risks. Alternatively, project managers may identify risks based on their

experience of what went wrong on previous projects.

As a starting point for risk identification, a checklist of different types of

risk may be used. Six types of risk may be included in a risk checklist:

1. Estimation risks arise from the management estimates of the resources

required

to build the system.

2. Organizational risks arise from the organizational environment where

the soft-

ware is being developed.

3. People risks are associated with the people in the development team.

4. Requirements risks come from changes to the customer requirements

and the

process of managing the requirements change.

5. Technology risks come from the software or hardware technologies that

are

used to develop the system.

6. Tools risks come from the software tools and other support software

used to

develop the system.

Figure 22.3 shows examples of possible risks in each of these categories.

When

you have finished the risk identification process, you should have a long

list of risks that could occur and that could affect the product, the process,

and the business. You then need to prune this list to a manageable size. If

you have too many risks, it is practically impossible to keep track of all of

them.

648 Chapter 22 Project management

Risk type

Possible risks

Estimation

1. The time required to develop the software is

underestimated.

2. The rate of defect repair is underestimated.

3. The size of the software is underestimated.

Organizational

4. The organization is restructured so that different

management are responsible for the project.

5. Organizational financial problems force reductions in the

project budget.

People

6. It is impossible to recruit staff with the skills required.

7. Key staff are ill and unavailable at critical times.

8. Required training for staff is not available.

Requirements

9. Changes to requirements that require major design

rework are proposed.

10. Customers fail to understand the impact of requirements

changes.

Technology

11. The database used in the system cannot process as many

transactions per second as expected.

12. Faults in reusable software components have to be

repaired before these components are reused.

Tools

13. The code generated by software code generation tools is

Figure 22.3 Examples

inefficient.

of different types of

14. Software tools cannot work together in an integrated way.

risk

22.1.2 Risk analysis

During the risk analysis process, you have to consider each identified risk

and make a judgment about the probability and seriousness of that risk.

There is no easy way to do so. You have to rely on your judgment and

experience of previous projects and

the problems that arose in them. It is not possible to make precise,

numeric assess-

ment of the probability and seriousness of each risk. Rather, you should

assign the

risk to one of a number of bands:

1. The probability of the risk might be assessed as insignificant, low,

moderate,

high, or very high.

2. The effects of the risk might be assessed as catastrophic (threaten the

survival of the project), serious (would cause major delays), tolerable

(delays are within

allowed contingency), or insignificant.

You may then tabulate the results of this analysis process using a table

ordered according to the seriousness of the risk. Figure 22.4 illustrates this

for

the risks that I have identified in Figure 22.3. Obviously, the assessment of

probability and seriousness is arbitrary here. To make this assessment, you

need

22.1 Risk management 649

Risk

Probability

Effects

Organizational financial problems force reductions in the project

Low

Catastrophic

budget (5).

It is impossible to recruit staff with the skills required (6).

High

Catastrophic

Key staff are ill at critical times in the project (7).

Moderate

Serious

Faults in reusable software components have to be repaired

Moderate

Serious

before these components are reused (12).

Changes to requirements that require major design rework are

Moderate

Serious

proposed (9).

The organization is restructured so that different managements are

High

Serious

responsible for the project (4).

The database used in the system cannot process as many

Moderate

Serious

transactions per second as expected (11).

The time required to develop the software is underestimated (1).

High

Serious

Software tools cannot be integrated (14).

High

Tolerable

Customers fail to understand the impact of requirements

Moderate

Tolerable

changes (10).

Required training for staff is not available (8).

Moderate

Tolerable

The rate of defect repair is underestimated (2).

Moderate

Tolerable

The size of the software is underestimated (3).

High

Tolerable

Code generated by code generation tools is inefficient (13).

Moderate

Insignificant

Figure 22.4 Risk types

and examples

detailed information about the project, the process, the development team,

and

the organization.

Of course, both the probability and the assessment of the effects of a risk

may

change as more information about the risk becomes available and as risk

manage-

ment plans are implemented. You should therefore update this table

during each

iteration of the risk management process.

Once the risks have been analyzed and ranked, you should assess which of

these

risks are most significant. Your judgment must depend on a combination

of the prob-

ability of the risk arising and the effects of that risk. In general,

catastrophic risks should always be considered, as should all serious risks

that have more than a moderate probability of occurrence.

Boehm (Boehm 1988) recommends identifying and monitoring the “top

10” risks.

However, I think that the right number of risks to monitor must depend on

the pro-

ject. It might be 5 or it might be 15. From the risks identified in Figure

22.4, I think that it is appropriate to consider the eight risks that have

catastrophic or serious consequences (Figure 22.5).

650 Chapter 22 Project management

22.1.3 Risk planning

The risk planning process develops strategies to manage the key risks that

threaten

the project. For each risk, you have to think of actions that you might take

to minimize the disruption to the project if the problem identified in the

risk occurs. You should also think about the information that you need to

collect while monitoring the project so that emerging problems can be

detected before they become serious.

In risk planning, you have to ask “what-if” questions that consider both

individual

risks, combinations of risks, and external factors that affect these risks. For

example, questions that you might ask are:

1. What if several engineers are ill at the same time?

2. What if an economic downturn leads to budget cuts of 20% for the

project?

3. What if the performance of open-source software is inadequate and the

only

expert on that open-source software leaves?

4. What if the company that supplies and maintains software components

goes out

of business?

5. What if the customer fails to deliver the revised requirements as

predicted?

Based on the answers to these “what-if” questions, you may devise

strategies for

managing the risks. Figure 22.5 shows possible risk management strategies

that have

been identified for the key risks (i.e., those that are serious or intolerable)

shown in Figure 22.4. These strategies fall into three categories:

1. Avoidance strategies Following these strategies means that the

probability that the risk will arise is reduced. An example of a risk

avoidance strategy is the

strategy for dealing with defective components shown in Figure 22.5.

2. Minimization strategies Following these strategies means that the impact

of the risk is reduced. An example of a risk minimization strategy is the

strategy for

staff illness shown in Figure 22.5.

3. Contingency plans Following these strategies means that you are

prepared for the worst and have a strategy in place to deal with it. An

example of a contingency strategy is the strategy for organizational

financial problems that I have

shown in Figure 22.5.

You can see a clear analogy here with the strategies used in critical

systems

to ensure reliability, security, and safety, where you must avoid, tolerate,

or

recover from failures. Obviously, it is best to use a strategy that avoids the

risk.

If this is not possible, you should use a strategy that reduces the chances

that the risk will have serious effects. Finally, you should have strategies

in place to

22.1 Risk management 651

Risk

Strategy

Organizational

Prepare a briefing document for senior management

financial problems

showing how the project is making a very important

contribution to the goals of the business and presenting

reasons why cuts to the project budget would not be

cost-effective.

Recruitment

Alert customer to potential difficulties and the possibility

problems

of delays; investigate buying-in components.

Staff illness

Reorganize team so that there is more overlap of work

and people therefore understand each other’s jobs.

Defective

Replace potentially defective components with bought-in

components

components of known reliability.

Requirements

Derive traceability information to assess requirements

changes

change impact; maximize information hiding in the design.

Organizational

Prepare a briefing document for senior management

restructuring

showing how the project is making a very important

contribution to the goals of the business.

Database

Investigate the possibility of buying a higher-performance

performance

database.

Underestimated

Investigate buying-in components; investigate use of

Figure 22.5 Strategies

development time

automated code generation.

to help manage risk

cope with the risk if it arises. These should reduce the overall impact of a

risk on the project or product.

22.1.4 Risk monitoring

Risk monitoring is the process of checking that your assumptions about

the product,

process, and business risks have not changed. You should regularly assess

each of

the identified risks to decide whether or not that risk is becoming more or

less probable. You should also think about whether or not the effects of

the risk have changed.

To do this, you have to look at other factors, such as the number of

requirements

change requests, which give you clues about the risk probability and its

effects.

These factors are obviously dependent on the types of risk. Figure 22.6

gives some

examples of factors that may be helpful in assessing these risk types.

You should monitor risks regularly at all stages in a project. At every

manage-

ment review, you should consider and discuss each of the key risks

separately. You

should decide if the risk is more or less likely to arise and if the

seriousness and consequences of the risk have changed.

652 Chapter 22 Project management

Risk type

Potential indicators

Estimation

Failure to meet agreed schedule; failure to clear reported

defects.

Organizational

Organizational gossip; lack of action by senior management.

People

Poor staff morale; poor relationships among team members;

high staff turnover.

Requirements

Many requirements change requests; customer complaints.

Technology

Late delivery of hardware or support software; many

reported technology problems.

Tools

Reluctance by team members to use tools; complaints about

software tools; requests for faster computers/more memory,

Figure 22.6 Risk

and so on.

indicators

22.2 Managing people

The people working in a software organization are its greatest assets. It is

expen-

sive to recruit and retain good people, and it is up to software managers to

ensure

that the engineers working on a project are as productive as possible. In

success-

ful companies and economies, this productivity is achieved when people

are

respected by the organization and are assigned responsibilities that reflect

their

skills and experience.

It is important that software project managers understand the technical

issues that

influence the work of software development. Unfortunately, however,

good software

engineers are not always good people managers. Software engineers often

have

strong technical skills but may lack the softer skills that enable them to

motivate and lead a project development team. As a project manager, you

should be aware of the

potential problems of people management and should try to develop

people manage-

ment skills.

There are four critical factors that influence the relationship between a

manager

and the people that he or she manages:

1. Consistency All the people in a project team should be treated in a

comparable way. No one expects all rewards to be identical, but people

should not feel that

their contribution to the organization is undervalued.

2. Respect Different people have different skills, and managers should

respect these differences. All members of the team should be given an

opportunity to

make a contribution. In some cases, of course, you will find that people

simply

don’t fit into a team and they cannot continue, but it is important not to

jump to

conclusions about them at an early stage in the project.

22.2 Managing people 653

3. Inclusion People contribute effectively when they feel that others listen

to them and take account of their proposals. It is important to develop a

working

environment where all views, even those of the least experienced staff, are

considered.

4. Honesty As a manager, you should always be honest about what is going

well and what is going badly in the team. You should also be honest about

your level

of technical knowledge and be willing to defer to staff with more

knowledge

when necessary. If you try to cover up ignorance or problems, you will

eventu-

ally be found out and will lose the respect of the group.

Practical people management has to be based on experiences so my aim in

this

section and the following section on teamwork is to raise awareness of the

most

important issues that project managers may have to deal with.

22.2.1 Motivating people

As a project manager, you need to motivate the people who work with

you so that

they will contribute to the best of their abilities. In practice, motivation

means organizing work and its environment to encourage people to work

as effectively as possi-

ble. If people are not motivated, they will be less interested in the work

they are

doing. They will work slowly, be more likely to make mistakes, and will

not contrib-

ute to the broader goals of the team or the organization.

To provide this encouragement, you should understand a little about what

moti-

vates people. Maslow (Maslow 1954) suggests that people are motivated

by satisfy-

ing their needs. These needs are arranged in a series of levels, as shown in

Figure 22.7.

The lower levels of this hierarchy represent fundamental needs for food,

sleep, and

so on, and the need to feel secure in an environment. Social need is

concerned with

the need to feel part of a social grouping. Esteem need represents the need

to feel

respected by others, and self-realization need is concerned with personal

develop-

ment. People need to satisfy lower-level needs such as hunger before the

more

abstract, higher-level needs.

People working in software development organizations are not usually

hungry,

thirsty, or physically threatened by their environment. Therefore, making

sure that

peoples’ social, esteem, and self-realization needs are satisfied is most

important

from a management point of view.

1. To satisfy social needs, you need to give people time to meet their co-

workers

and provide places for them to meet. Software companies such as Google

pro-

vide social space in their offices for people to get together. This is

relatively

easy when all of the members of a development team work in the same

place,

but, increasingly, team members are not located in the same building or

even the

same town or state. They may work for different organizations or from

home

most of the time.

654 Chapter 22 Project management

Self-realization

needs

Esteem needs

Social needs

Safety needs

Figure 22.7 Human

Physiological needs

needs hierarchy

Social networking systems and teleconferencing can be used for remote

com-

munications, but my experience with these systems is that they are most

effec-

tive when people already know each other. You should arrange some face-

to-face

meetings early in the project so that people can directly interact with

other

members of the team. Through this direct interaction, people become part

of a

social group and accept the goals and priorities of that group.

2. To satisfy esteem needs, you need to show people that they are valued

by the

organization. Public recognition of achievements is a simple and effective

way

of doing this. Obviously, people must also feel that they are paid at a level

that

reflects their skills and experience.

3. Finally, to satisfy self-realization needs, you need to give people

responsibility for their work, assign them demanding (but not impossible)

tasks, and provide

opportunities for training and development where people can enhance

their

skills. Training is an important motivating influence as people like to gain

new

knowledge and learn new skills.

Maslow’s model of motivation is helpful up to a point, but I think that a

problem

with it is that it takes an exclusively personal viewpoint on motivation. It

does not take adequate account of the fact that people feel themselves to

be part of an organization, a professional group, and one or more cultures.

Being a member of a cohe-

sive group is highly motivating for most people. People with fulfilling jobs

often

like to go to work because they are motivated by the people they work

with and the

work that they do. Therefore, as a manager, you also have to think about

how a

group as a whole can be motivated. I discuss this and other teamwork

issues in

Section 22.3.

In Figure 22.8, I illustrate a problem of motivation that managers often

have to

face. In this example, a competent group member loses interest in the

work and in

the group as a whole. The quality of her work falls and becomes

unacceptable. This

situation has to be dealt with quickly. If you don’t sort out the problem,

the other group members will become dissatisfied and feel that they are

doing an unfair share

of the work.

22.2 Managing people 655

Case study: Motivation

Alice is a software project manager working in a company that develops

alarm systems.

This company wishes to enter the growing market of assistive technology

to help elderly and disabled people live independently. Alice has been

asked to lead a team of six

developers that can develop new products based on the company’s alarm

technology.

Alice’s assistive technology project starts well. Good working relationships

develop within the team, and creative new ideas are developed. The team

decides to develop a system that a user can initiate and control the alarm

system from a cell phone or tablet computer. However, some months into

the project, Alice notices that Dorothy, a hardware expert, starts coming

into work late, that the quality of her work is deteriorating, and,

increasingly, that she does not appear to be communicating with other

members

of the team.

Alice talks about the problem informally with other team members to try

to find out

if Dorothy’s personal circumstances have changed and if this might be

affecting her work. They don’t know of anything, so Alice decides to talk

with Dorothy to try to

understand the problem.

After some initial denials of any problem, Dorothy admits that she has lost

interest in the job. She expected that she would be able to develop and use

her hardware

interfacing skills. However, because of the product direction that has been

chosen, she has little opportunity to use these skills. Basically, she is

working as a C programmer on the alarm system software.

While she admits that the work is challenging, she is concerned that she is

not

developing her interfacing skills. She is worried that finding a job that

involves hardware interfacing will be difficult after this project. Because

she does not want to upset the team by revealing that she is thinking

about the next project, she has decided that Figure 22.8 Individual

it is best to minimize conversation with them.

motivation

In this example, Alice tries to find out if Dorothy’s personal circumstances

could

be the problem. Personal difficulties commonly affect motivation because

people

cannot therefore concentrate on their work. You may have to give them

time and

support to resolve these issues, although you also have to make it clear

that they still have a responsibility to their employer.

Dorothy’s motivation problem is one that can arise when projects develop

in an

unexpected direction. People who expect to do one type of work may end

up doing

something completely different. In those circumstances, you may decide

that the

team member should leave the team and find opportunities elsewhere. In

this

example, however, Alice decides to try to convince Dorothy that

broadening her

experience is a positive career step. She gives Dorothy more design

autonomy and

organizes training courses in software engineering that will give her more

opportunities after her current project has finished.

Psychological personality type also influences motivation. Bass and

Dunteman

(Bass and Dunteman 1963) identified three classifications for professional

workers:

1. Task-oriented people, who are motivated by the work they do. In

software engineering, these are people who are motivated by the

intellectual challenge of soft-

ware development.

656 Chapter 22 Project management

The People Capability Maturity Model

The People Capability Maturity Model (P-CMM) is a framework for

assessing how well organizations manage the development of their staff. It

highlights best practice in people management and provides a basis for

organizations to improve their people management processes. It is best

suited to large rather than small, informal companies.

http://software-engineering-book.com/web/people-cmm/

2. Self-oriented people, who are principally motivated by personal success

and recognition. They are interested in software development as a means

of achieving their own goals. They often have longer-term goals, such as

career progres-

sion, that motivate them, and they wish to be successful in their work to

help

realize these goals.

3. Interaction-oriented people, who are motivated by the presence and

actions of co-workers. As more and more attention is paid to user interface

design, interaction-oriented individuals are becoming more involved in

software engineering.

Research has shown that interaction-oriented personalities usually like to

work as

part of a group, whereas task-oriented and self-oriented people usually

prefer to act as individuals. Women are more likely to be interaction-

oriented than men are. They

are often more effective communicators. I discuss the mix of these

different person-

ality types in groups in the case study shown later in Figure 22.10.

Each individual’s motivation is made up of elements of each class, but one

type

of motivation is usually dominant at any one time. However, individuals

can change.

For example, technical people who feel they are not being properly

rewarded can

become self-oriented and put personal interests before technical concerns.

If a group works particularly well, self-oriented people can become more

interaction-oriented.

22.3 Teamwork

Most professional software is developed by project teams that range in size

from

two to several hundred people. However, as it is impossible for everyone

in a large

group to work together on a single problem, large teams are usually split

into a

number of smaller groups. Each group is responsible for developing part of

the

overall system. The best size for a software engineering group is 4 to 6

members,

and they should never have more than 12 members. When groups are

small, com-

munication problems are reduced. Everyone knows everyone else, and the

whole

group can get around a table for a meeting to discuss the project and the

software

that they are developing.

22.3 Teamwork 657

Putting together a group that has the right balance of technical skills,

experi-

ence, and personalities is a critical management task. However, successful

groups

are more than simply a collection of individuals with the right balance of

skills. A good group is cohesive and thinks of itself as a strong, single unit.

The people

involved are motivated by the success of the group as well as by their own

per-

sonal goals.

In a cohesive group, members think of the group as more important than

the

individuals who are group members. Members of a well-led, cohesive

group are

loyal to the group. They identify with group goals and other group

members.

They attempt to protect the group, as an entity, from outside interference.

This

makes the group robust and able to cope with problems and unexpected

situations.

The benefits of creating a cohesive group are:

1. The group can establish its own quality standards Because these standards

are established by consensus, they are more likely to be observed than

external

standards imposed on the group.

2. Individuals learn from and support each other Group members learn by

working together. Inhibitions caused by ignorance are minimized as

mutual learning

is encouraged.

3. Knowledge is shared Continuity can be maintained if a group member

leaves.

Others in the group can take over critical tasks and ensure that the project

is not

unduly disrupted.

4. Refactoring and continual improvement is encouraged Group members

work

collectively to deliver high-quality results and fix problems, irrespective of

the

individuals who originally created the design or program.

Good project managers should always try to encourage group

cohesiveness. They

may try to establish a sense of group identity by naming the group and

establishing a group identity and territory. Some managers like explicit

group-building activities

such as sports and games, although these are not always popular with

group mem-

bers. Social events for group members and their families are a good way to

bring

people together.

One of the most effective ways of promoting cohesion is to be inclusive.

That is,

you should treat group members as responsible and trustworthy, and make

informa-

tion freely available. Sometimes managers feel that they cannot reveal

certain information to everyone in the group. This invariably creates a

climate of mistrust. An

effective way of making people feel valued and part of a group is to make

sure that

they know what is going on.

You can see an example in the case study in Figure 22.9. Alice arranges

regular

informal meetings where she tells the other group members what is going

on. She

makes a point of involving people in the product development by asking

them to

come up with new ideas derived from their own family experiences. The

“away

658 Chapter 22 Project management

Case study: Team spirit

Alice, an experienced project manager, understands the importance of

creating a cohesive group. As her company is developing a new product,

she takes the opportunity to involve all group members in the product

specification and design by getting them to discuss possible technology

with elderly members of their families. She encourages

them to bring these family members to meet other members of the

development group.

Alice also arranges monthly lunches for everyone in the group. These

lunches are an

opportunity for all team members to meet informally, talk around issues of

concern,

and get to know each other. At the lunch, Alice tells the group what she

knows about organizational news, policies, strategies, and so forth. Each

team member then briefly summarizes what they have been doing, and

the group discusses a general topic, such

as new product ideas from elderly relatives.

Every few months, Alice organizes an “away day” for the group where the

team

spends two days on “technology updating.” Each team member prepares

an update on

a relevant technology and presents it to the group. This is an offsite

meeting, and

Figure 22.9 Group

plenty of time is scheduled for discussion and social interaction.

cohesion

days” are also good ways of promoting cohesion: People relax together

while they

help each other learn about new technologies.

Whether or not a group is effective depends, to some extent, on the nature

of the

project and the organization doing the work. If an organization is in a

state of turmoil with constant reorganizations and job insecurity, it is

difficult for team members to focus on software development. Similarly, if

a project keeps changing and is in danger of cancellation, people lose

interest in it.

Given a stable organizational and project environment, the three factors

that have

the biggest effect on team working are:

1. The people in the group You need a mix of people in a project group as

software development involves diverse activities such as negotiating with

clients, programming, testing, and documentation.

2. The way the group is organized A group should be organized so that

individuals can contribute to the best of their abilities and tasks can be

completed as

expected.

3. Technical and managerial communications Good communication between

group members, and between the software engineering team and other

project

stakeholders, is essential.

As with all management issues, getting the right team cannot guarantee

project

success. Too many other things can go wrong, including changes to the

business and

the business environment. However, if you don’t pay attention to group

composi-

tion, organization, and communications, you increase the likelihood that

your pro-

ject will run into difficulties.

22.3 Teamwork 659

22.3.1 Selecting group members

A manager or team leader’s job is to create a cohesive group and organize

that group so that they work together effectively. This task involves

selecting a group with the right balance of technical skills and

personalities. Sometimes people are hired from outside the organization;

more often, software engineering groups are put together

from current employees who have experience on other projects. Managers

rarely

have a completely free hand in team selection. They often have to use the

people

who are available in the company, even if they are not the ideal people for

the job.

Many software engineers are motivated primarily by their work. Software

devel-

opment groups, therefore, are often composed of people who have their

own ideas

about how technical problems should be solved. They want to do the best

job possi-

ble, so they may deliberately redesign systems that they think can be

improved and

add extra system features that are not in the system requirements. Agile

methods

encourage engineers to take the initiative to improve the software.

However, some-

times this means that time is spent doing things that aren’t really needed

and that

different engineers compete to rewrite each other’s code.

Technical knowledge and ability should not be the only factor used to

select group

members. The “competing engineers” problem can be reduced if the

people in the

group have complementary motivations. People who are motivated by the

work are

likely to be the strongest technically. People who are self-oriented will

probably be best at pushing the work forward to finish the job. People

who are interaction-oriented help facilitate communications within the

group. I think that it is particularly important to have interaction-oriented

people in a group. They like to talk to people and can detect tensions and

disagreements at an early stage, before these problems

have a serious impact on the group.

In the case study in Figure 22.10, I have suggested how Alice, the project

man-

ager, has tried to create a group with complementary personalities. This

particular

group has a good mix of interaction- and task-oriented people, but I have

already

discussed, in Figure 22.8, how Dorothy’s self-oriented personality has

caused prob-

lems because she has not been doing the work that she expected. Fred’s

part-time

role in the group as a domain expert might also be a problem. He is mostly

interested in technical challenges, so he may not interact well with other

group members. The

fact that he is not always part of the team means that he may not fully

relate to the team’s goals.

It is sometimes impossible to choose a group with complementary

personalities.

If this is the case, the project manager has to control the group so that

individual goals do not take precedence over organizational and group

objectives. This control

is easier to achieve if all group members participate in each stage of the

project.

Individual initiative is most likely to develop when group members are

given instructions without being aware of the part that their task plays in

the overall project.

For example, say a software engineer takes over the development of a

system and

notices that possible improvements could be made to the design. If he or

she imple-

ments these improvements without understanding the rationale for the

original

design, any changes, though well-intentioned, might have adverse

implications for

660 Chapter 22 Project management

Case study: Group composition

In creating a group for assistive technology development, Alice is aware of

the importance of selecting members with complementary personalities.

When interviewing

potential group members, she tried to assess whether they were task-

oriented, self-

oriented, or interaction-oriented. She felt that she was primarily a self-

oriented type because she considered the project to be a way of getting

noticed by senior management and possibly being promoted. She therefore

looked for one or perhaps two inter-

action-oriented personalities, with task-oriented individuals to complete

the team. The final assessment that she arrived at was:

Alice—self-oriented

Brian—task-oriented

Chun—interaction-oriented

Dorothy—self-oriented

Ed—interaction-oriented

Fiona—task-oriented

Fred—task-oriented

Figure 22.10 Group

Hassan—interaction-oriented

composition

other parts of the system. If all the members of the group are involved in

the design from the start, they are more likely to understand why design

decisions have been

made. They may then identify with these decisions rather than oppose

them.

22.2.3 Group organization

The way a group is organized affects the group’s decisions, the ways

information is

exchanged, and the interactions between the development group and

external project

stakeholders. Important organizational questions for project managers

include the

following:

1. Should the project manager be the technical leader of the group? The

technical

leader or system architect is responsible for the critical technical decisions

made

during software development. Sometimes the project manager has the skill

and

experience to take on this role. However, for large projects, it is best to

separate technical and managerial roles. The project manager should

appoint a senior engineer to be the project architect, who will take

responsibility for technical leadership.

2. Who will be involved in making critical technical decisions, and how

will these

decisions be made? Will decisions be made by the system architect or the

pro-

ject manager or by reaching consensus among a wider range of team

members?

3. How will interactions with external stakeholders and senior company

manage-

ment be handled? In many cases, the project manager will be responsible

for these

interactions, assisted by the system architect if there is one. However, an

alternative organizational model is to create a dedicated role concerned

with external liaison

and appoint someone with appropriate interaction skills to that role.

22.3 Teamwork 661

Hiring the right people

Project managers are often responsible for selecting the people in the

organization who will join their software engineering team. Getting the

best possible people in this process is very important as poor selection

decisions may be a serious risk to the project.

Key factors that should influence the selection of staff are education and

training, application domain and technology experience, communication

ability, adaptability, and problem solving ability.

http://software-engineering-book.com/web/people-selection/

4. How can groups integrate people who are not co-located? It is now

common for

groups to include members from different organizations and for people to

work

from home as well as in a shared office. This change has to be considered

in

group decision-making processes.

5. How can knowledge be shared across the group? Group organization

affects

information sharing as certain methods of organization are better for

sharing

than others. However, you should avoid too much information sharing as

people

become overloaded and excessive information distracts them from their

work.

Small programming groups are usually organized in an informal way. The

group

leader gets involved in the software development with the other group

members. In an informal group, the group as a whole discusses the work

to be carried out, and tasks are allocated according to ability and

experience. More senior group members may be

responsible for the architectural design. However, detailed design and

implementation is the responsibility of the team member who is allocated

to a particular task.

Agile development teams are always informal groups. Agile enthusiasts

claim

that formal structure inhibits information exchange. Many decisions that

are usually seen as management decisions (such as decisions on schedule)

may be devolved to

group members. However, there still needs to be a project manager who is

responsi-

ble for strategic decision making and communications outside of the

group.

Informal groups can be very successful, particularly when most group

members

are experienced and competent. Such a group makes decisions by

consensus, which

improves cohesiveness and performance. However, if a group is composed

mostly of

inexperienced or incompetent members, informality can be a hindrance.

With no

experienced engineers to direct the work, the result can be a lack of

coordination

between group members and, possibly, eventual project failure.

In hierarchical groups the group leader is at the top of the hierarchy. He

or she has more formal authority than the group members and so can

direct their work. There is

a clear organizational structure, and decisions are made toward the top of

the hierarchy and implemented by people lower down. Communications

are primarily

instructions from senior staff; the people at lower levels of the hierarchy

have relatively little communication with the managers at the upper

levels.

662 Chapter 22 Project management

Hierarchical groups can work well when a well-understood problem can

be easily

broken down into software components that can be developed in different

parts of the hierarchy. This grouping allows for rapid decision making,

which is why military organizations follow this model. However, it rarely

works well for complex software engineering. In software development,

effective team communications at all levels is essential: 1. Changes to the

software often require changes to several parts of the system, and this

requires discussion and negotiation at all levels in the hierarchy.

2. Software technologies change so fast that more junior staff may know

more

about new technologies than experienced staff. Top-down communications

may

mean that the project manager does not find out about the opportunities

of using

these new technologies. More junior staff may become frustrated because

of

what they see as old-fashioned technologies being used for development.

A major challenge facing project managers is the difference in technical

ability

between group members. The best programmers may be up to 25 times

more productive

than the worst programmers. It makes sense to use these “super-

programmers” in the

most effective way and to provide them with as much support as possible.

At the same time, focusing on the super-programmers can be demotivating

for other

group members who are resentful that they are not given responsibility.

They may be

concerned that this will affect their career development. Furthermore, if a

“super-

programmer” leaves the company, the impact on a project can be huge.

Therefore,

adopting a group model that is based on individual experts can pose

significant risks.

22.3.3 Group communications

It is absolutely essential that group members communicate effectively and

efficiently with each other and with other project stakeholders. Group

members must exchange

information on the status of their work, the design decisions that have

been made,

and changes to previous design decisions. They have to resolve problems

that arise

with other stakeholders and inform these stakeholders of changes to the

system, the

group, and delivery plans. Good communication also helps strengthen

group cohe-

siveness. Group members come to understand the motivations, strengths,

and weak-

nesses of other people in the group.

The effectiveness and efficiency of communications are influenced by:

1. Group size As a group gets bigger, it gets harder for members to

communicate effectively. The number of one-way communication links is

n * ( n − 1), where n is the group size, so, with a group of eight members,

there are 56 possible

communication pathways. This means that it is quite possible that some

people

will rarely communicate with each other. Status differences between

group

members mean that communications are often one-way. Managers and

experi-

enced engineers tend to dominate communications with less experienced

staff,

who may be reluctant to start a conversation or make critical remarks.

22.3 Teamwork 663

The physical work environment

Group communications and individual productivity are both affected by

the team’s working environment.

Individual workspaces are better for concentration on detailed technical

work as people are less likely to be distracted by interruptions. However,

shared workspaces are better for communications. A well-designed work

environment takes both of these needs into account.

http://software-engineering-book.com/web/workspace/

2. Group structure People in informally structured groups communicate

more

effectively than people in groups with a formal, hierarchical structure. In

hierar-

chical groups, communications tend to flow up and down the hierarchy.

People

at the same level may not talk to each other. This is a particular problem

in a

large project with several development groups. If people working on

different

subsystems only communicate through their managers, then there are

more

likely to be delays and misunderstandings.

3. Group composition People with the same personality types (discussed in

Section 22.2) may clash, and, as a result, communications can be

inhibited.

Communication is also usually better in mixed-sex groups than in single-

sex

groups (Marshall and Heslin 1975). Women are often more interaction-

oriented

than men and may act as interaction controllers and facilitators for the

group.

4. The physical work environment The organization of the workplace is a

major factor in facilitating or inhibiting communications. While some

companies use standard

open-plan offices for their staff, others invest in providing a workspace

that includes a mixture of private and group working areas. This allows

for both collaborative

activities and individual development that require a high level of

concentration.

5. The available communication channels There are many different forms of

communication—face to face, email messages, formal documents,

telephone,

and technologies such as social networking and wikis. As project teams

become

increasingly distributed, with team members working remotely, you need

to

make use of interaction technologies, such as conferencing systems, to

facilitate

group communications.

Project managers usually work to tight deadlines, and, consequently, they

often try

to use communication channels that don’t take up too much of their time.

They may

rely on meetings and formal documents to pass on information to project

staff and

stakeholders and send long emails to project staff. Unfortunately, while

this may be an efficient approach to communication from a project

manager’s perspective, it is not

usually very effective. There are often good reasons why people can’t

attend meetings, and so they don’t hear the presentation. People do not

have time to read long documents and emails that are not directly relevant

to their work. When several versions of the same document are produced,

readers find it difficult to keep track of the changes.

664 Chapter 22 Project management

Effective communication is achieved when communications are two-way

and

the people involved can discuss issues and information and establish a

common

understanding of proposals and problems. All this can be done through

meetings,

although these meetings are often dominated by powerful personalities.

Informal

discussions when a manager meets with the team for coffee are sometimes

more

effective.

More and more project teams include remote members, which also makes

meet-

ings more difficult. To involve them in communications, you may make

use of

wikis and blogs to support information exchange. Wikis support the

collaborative

creation and editing of documents, and blogs support threaded discussions

about

questions and comments made by group members. Wikis and blogs allow

project

members and external stakeholders to exchange information, irrespective

of their

location. They help manage information and keep track of discussion

threads,

which often become confusing when conducted by email. You can also use

instant

messaging and teleconferences, which can be easily arranged, to resolve

issues that

need discussion.

K e y P o i n t s

Good software project management is essential if software engineering

projects are to be developed on schedule and within budget.

Software management is distinct from other engineering management.

Software is intangible.

Projects may be novel or innovative, so there is no body of experience to

guide their management. Software processes are not as mature as

traditional engineering processes.

Risk management involves identifying and assessing major project risks

to establish the probability that they will occur and the consequences for

the project if that risk does arise. You should make plans to avoid,

manage, or deal with likely risks if or when they arise.

People management involves choosing the right people to work on a

project and organizing the team and its working environment so that they

are as productive as possible.

People are motivated by interaction with other people, by the

recognition of management and their peers, and by being given

opportunities for personal development.

Software development groups should be fairly small and cohesive. The

key factors that influence the effectiveness of a group are the people in

that group, the way that it is organized, and the communication between

group members.

Communications within a group are influenced by factors such as the

status of group members, the size of the group, the gender composition of

the group, personalities, and available communication channels.

Chapter 22 Exercises 665

F u R T h e R R e a d i n g

The Mythical Man Month: Essays on Software Engineering (Anniversary

Edition). The problems of software management have remained largely

unchanged since the 1960s, and this is one of the best books on the topic.

It presents an interesting and readable account of the management of one

of the first very large software projects, the IBM OS/360 operating system.

The anniversary edition ( published 20 years after the original edition in

1975) includes other classic papers by Brooks.

(F. P. Brooks, 1995, Addison-Wesley).

Peopleware: Productive Projects and Teams, 2nd ed. This now classic book

focuses on the importance of treating people properly when managing

software projects. It is one of the few books that recognizes how the place

where people work influences communications and productivity. Strongly

recommended. (T. DeMarco and T. Lister, 1999, Dorset House).

Waltzing with Bears: Managing Risk on Software Projects. A very practical

and easy-to-read introduction to risks and risk management. (T. DeMarco

and T. Lister, 2003, Dorset House).

Effective Project Management: Traditional, Agile, Extreme. 2014 (7th ed.).

This is a textbook on project management in general rather than software

project management. It is based on the so-called PMBOK (Project

Management Body of Knowledge) and, unlike most books on this topic,

discusses PM techniques for agile projects. (R. K. Wysocki, 2014).

W e b s i T e

PowerPoint slides for this chapter:

www.pearsonglobaleditions.com/Sommerville

Links to supporting videos:

http://software-engineering-book.com/videos/software-management/

e x e R C i s e s

22.1. Explain why the intangibility of software systems poses special

problems for software project management.

22.2. Explain how company size and software size are factors that affect

software project management.

22.3. Using reported instances of project problems in the literature, list

management difficulties and errors that occurred in these failed

programming projects. (I suggest that you start with The Mythical Man

Month, as suggested in Further Reading.)

22.4. In addition to the risks shown in Figure 22.1, identify at least six

other possible risks that could arise in software projects.

22.5. What is risk monitoring? How can risks be monitored? List a few

examples of types of risks and their potential indicators.

666 Chapter 22 Project management

22.6. Fixed-price contracts, where the contractor bids a fixed price to

complete a system development, may be used to move project risk from

client to contractor. If anything goes wrong, the contractor has to pay.

Suggest how the use of such contracts may increase the likelihood that

product risks will arise.

22.7. Explain why keeping all members of a group informed about

progress and technical decisions in a project can improve group

cohesiveness.

22.8. What qualities of a cohesive group’s members make the group

robust? List out the key benefits of creating a cohesive group.

22.9. Write a case study in the style used here to illustrate the importance

of communications in a project team. Assume that some team members

work remotely and that it is not possible to get the whole team together at

short notice.

22.10. Your manager asks you to deliver software to a schedule that you

know can only be met by asking your project team to work unpaid

overtime. All team members have young children.

Discuss whether you should accept this demand from your manager or

whether you should persuade your team to give their time to the

organization rather than to their families. What factors might be

significant in your decision?

R e F e R e n C e s

Bass, B. M., and G. Dunteman. 1963. “Behaviour in Groups as a Function

of Self, Interaction and Task Orientation.” J. Abnorm. Soc. Psychology. 66

(4): 19–28. doi:10.1037/h0042764.

Boehm, B. W. 1988. “A Spiral Model of Software Development and

Enhancement.” IEEE Computer 21 (5): 61–72. doi:10.1109/2.59.

Hall, E. 1998. Managing Risk: Methods for Software Systems Development.

Reading, MA: Addison-Wesley.

Marshall, J. E., and R. Heslin. 1975. “Boys and Girls Together. Sexual

Composition and the Effect of Density on Group Size and Cohesiveness.” J.

of Personality and Social Psychology 35 (5): 952–961.

doi:10.1037/h0076838.

Maslow, A. A. 1954. Motivation and Personality. New York: Harper & Row.

Ould, M. 1999. Managing Software Quality and Business Risk. Chichester,

UK: John Wiley & Sons.

23

Project planning

Objectives

The objective of this chapter is to introduce project planning, scheduling,

and cost estimation. When you have read the chapter, you will:

understand the fundamentals of software costing and the factors that

affect the price of a software system to be developed for external

clients;

know what sections should be included in a project plan that is

created within a plan-driven development process;

understand what is involved in project scheduling and the use of bar

charts to present a project schedule;

have been introduced to agile project planning based on the

“planning game”;

understand cost estimation techniques and how the COCOMO II

model can be used for software cost estimation.

Contents

23.1 Software pricing

23.2 Plan-driven development

23.3 Project scheduling

23.4 Agile planning

23.5 Estimation techniques

23.6 COCOMO cost modeling

668 Chapter 23 Project planning

Project planning is one of the most important jobs of a software project

manager. As a manager, you have to break down the work into parts and

assign them to project

team members, anticipate problems that might arise, and prepare tentative

solutions

to those problems. The project plan, which is created at the start of a

project and

updated as the project progresses, is used to show how the work will be

done and to

assess progress on the project.

Project planning takes place at three stages in a project life cycle:

1. At the proposal stage, when you are bidding for a contract to develop or

provide a software system. You need a plan at this stage to help you

decide if you have

the resources to complete the work and to work out the price that you

should

quote to a customer.

2. During the project startup phase, when you have to plan who will work

on the

project, how the project will be broken down into increments, how

resources

will be allocated across your company, and so on. Here, you have more

infor-

mation than at the proposal stage, and you can therefore refine the initial

effort

estimates that you have prepared.

3. Periodically throughout the project, when you update your plan to

reflect new

information about the software and its development. You learn more

about the

system being implemented and the capabilities of your development team.

As

software requirements change, the work breakdown has to be altered and

the

schedule extended. This information allows you to make more accurate

esti-

mates of how long the work will take.

Planning at the proposal stage is inevitably speculative, as you do not

have a

complete set of requirements for the software to be developed. You have

to respond

to a call for proposals based on a high-level description of the software

functionality that is required. A plan is often a required part of a proposal,

so you have to

produce a credible plan for carrying out the work. If you win the contract,

you then have to re-plan the project, taking into account changes since the

proposal was

made and new information about the system, the development process,

and the

development team.

When you are bidding for a contract, you have to work out the price that

you

will propose to the customer for developing the software. As a starting

point for

calculating this price, you need to draw up an estimate of your costs for

complet-

ing the project work. Estimation involves working out how much effort is

required to complete each activity and, from this step, calculating the total

cost

of activities. You should always calculate software costs objectively, with

the

aim of accurately predicting the cost of developing the software. Once you

have a reasonable estimate of the likely costs, you are then in a position to

calcu-

late the price that you will quote to the customer. As I discuss in the next

section, many factors influence the pricing of a software project—it is not

simply cost

plus profit.

Chapter 23 Project planning 669

Overhead costs

When you estimate the costs of effort on a software project, you don’t

simply multiply the salaries of the people involved by the time spent on

the project. You have to take into account all of the organizational

overheads (office space, administration, etc.) that must be covered by the

income from a project. You calculate the costs by computing these

overheads and adding a proportion to the costs of each engineer working

on a project.

http://software-engineering-book.com/web/overhead-costs/

You should use three main parameters when computing the costs of a

software

development project:

effort costs (the costs of paying software engineers and managers);

hardware and software costs, including hardware maintenance and

software

support; and

travel and training costs.

For most projects, the biggest cost is the effort cost. You have to estimate

the total effort (in person-months) that is likely to be required to complete

the work of a project. Obviously, you have limited information to make

such an estimate. You there-

fore make the best possible estimate and then add contingency (extra time

and effort) in case your initial estimate is optimistic.

For commercial systems, you normally use commodity hardware, which is

rela-

tively cheap. However, software costs can be significant if you have to

license mid-

dleware and platform software. Extensive travel may be needed when a

project is

developed at different sites. While travel costs themselves are usually a

small fraction of the effort costs, the time spent traveling is often wasted

and adds significantly to the effort costs of the project. You can use

electronic meeting systems and other collaborative software to reduce

travel and so have more time available for productive work.

Once a contract to develop a system has been awarded, the outline project

plan for the project has to be refined to create a project startup plan. At

this stage, you should know more about the requirements for this system.

Your aim should

be to create a project plan with enough detail to help make decisions

about pro-

ject staffing and budgeting. You use this plan as a basis for allocating

resources

to the project from within the organization and to help decide if you need

to hire

new staff.

The plan should also define project monitoring mechanisms. You must

keep track

of the progress of the project and compare actual and planned progress

and costs.

Although most companies have formal procedures for monitoring, a good

manager

should be able to form a clear picture of what is going on through

informal discus-

sions with project staff. Informal monitoring can predict potential project

problems by revealing difficulties as they occur. For example, daily

discussions with project

670 Chapter 23 Project planning

staff might reveal that the team is having problems with a software fault

in the communications systems. The project manager can then

immediately assign a communi-

cations expert to the problem to help find and solve the problem.

The project plan always evolves during the development process because

of

requirements changes, technology issues, and development problems.

Development

planning is intended to ensure that the project plan remains a useful

document for staff to understand what is to be achieved and when it is to

be delivered. Therefore, the

schedule, cost estimate, and risks all have to be revised as the software is

developed.

If an agile method is used, there is still a need for a project startup plan

because regardless of the approach used, the company still needs to plan

how resources will

be allocated to a project. However, this is not a detailed plan, and you

only need to include essential information about the work breakdown and

project schedule.

During development, an informal project plan and effort estimates are

drawn up for

each release of the software, with the whole team involved in the planning

process.

Some aspects of agile planning have already been covered in Chapter 3,

and I discuss other approaches in Section 23.4.

23.1 Software pricing

In principle, the price of a software system developed for a customer is

simply the

cost of development plus profit for the developer. In practice, however,

the relationship between the project cost and the price quoted to the

customer is not usually so simple. When calculating a price, you take

broader organizational, economic, political, and business considerations

into account (Figure 23.1). You need to think

about organizational concerns, the risks associated with the project, and

the type of contract that will be used. These issues may cause the price to

be adjusted upward

or downward.

To illustrate some of the project pricing issues, consider the following

scenario:

A small software company, PharmaSoft, employs 10 software engineers. It has

just finished a large project but only has contracts in place that require five

development staff. However, it is bidding for a very large contract with a

major pharmaceutical company that requires 30 person-years of effort over

two years. The project will not start for at least 12 months but, if granted, it

will transform the finances of the company.

PharmaSoft gets an opportunity to bid on a project that requires six people

and has to be completed in 10 months. The costs (including overheads of this

project) are estimated at $1.2 million. However, to improve its competitive

position, PharmaSoft decides to bid a price to the customer of $0.8 million.

This means that, although it loses money on this contract, it can retain

specialist staff for the more profitable future projects that are likely to come on

stream in a year’s time.

23.1 Software pricing 671

Factor

Description

Contractual terms

A customer may be willing to allow the developer to retain ownership

of the source code and reuse it in other projects. The price charged

might then be reduced to reflect the value of the source code to the

developer.

Cost estimate uncertainty

If an organization is unsure of its cost estimate, it may increase its price

by a contingency over and above its normal profit.

Financial health

Companies with financial problems may lower their price to gain a

contract. It is better to make a smaller-than-normal profit or break even

than to go out of business. Cash flow is more important than profit in

difficult economic times.

Market opportunity

A development organization may quote a low price because it wishes to

move into a new segment of the software market. Accepting a low

profit on one project may give the organization the opportunity to make

a greater profit later. The experience gained may also help it develop

new products.

Requirements volatility

If the requirements are likely to change, an organization may lower its

price to win a contract. After the contract is awarded, high prices can be

charged for changes to the requirements.

Figure 23.1 Factors

affecting software

pricing

This is an example of an approach to software pricing called “pricing to

win.”

Pricing to win means that a company has some idea of the price that the

customer expects to pay and makes a bid for the contract based on the

customer’s expected

price. This may seem unethical and unbusinesslike, but it does have

advantages for

both the customer and the system provider.

A project cost is agreed on the basis of an outline proposal. Negotiations

then take place between client and customer to establish the detailed

project specification.

This specification is constrained by the agreed cost. The buyer and seller

must agree on what is acceptable system functionality. The fixed factor in

many projects is not the project requirements but the cost. The

requirements may be changed so that the

project costs remain within budget.

For example, say a company (OilSoft) is bidding for a contract to develop

a fuel

delivery system for an oil company that schedules deliveries of fuel to its

service

stations. There is no detailed requirements document for this system, so

OilSoft estimates that a price of $900,000 is likely to be competitive and

within the oil compa-

ny’s budget. After being granted the contract, OilSoft then negotiates the

detailed

requirements of the system so that basic functionality is delivered. It then

estimates the additional costs for other requirements.

This approach has advantages for both the software developer and the cus-

tomer. The requirements are negotiated to avoid requirements that are

difficult

to implement and potentially very expensive. Flexible requirements make

it eas-

ier to reuse software. The oil company has awarded the contract to a

known

company that it can trust. Furthermore, it may be possible to spread the

cost of

672 Chapter 23 Project planning

the project over several versions of the system. This may reduce the costs

of

system deployment and allow the client to budget for the project cost over

sev-

eral financial years.

23.2 Plan-driven development

Plan-driven or plan-based development is an approach to software

engineering

where the development process is planned in detail. A project plan is

created that

records the work to be done, who will do it, the development schedule,

and the work

products. Managers use the plan to support project decision making and as

a way of

measuring progress. Plan-driven development is based on engineering

project man-

agement techniques and can be thought of as the “traditional” way of

managing large

software development projects. Agile development involves a different

planning

process, discussed in Section 23.4, where decisions are delayed.

The problem with plan-driven development is that early decisions have to

be revised

because of changes to the environments in which the software is

developed and used.

Delaying planning decisions avoids unnecessary rework. However, the

arguments in favor of a plan-driven approach are that early planning

allows organizational issues (availability of staff, other projects, etc.) to be

taken into account. Potential problems and dependencies are discovered

before the project starts, rather than once the project is underway.

In my view, the best approach to project planning involves a sensible

mixture of

plan-based and agile development. The balance depends on the type of

project and

skills of the people who are available. At one extreme, large security and

safety-

critical systems require extensive up-front analysis and may have to be

certified

before they are put into use. These systems should be mostly plan-driven.

At the

other extreme, small to medium-size information systems, to be used in a

rapidly

changing competitive environment, should be mostly agile. Where several

compa-

nies are involved in a development project, a plan-driven approach is

normally used

to coordinate the work across each development site.

23.2.1 Project plans

In a plan-driven development project, a project plan sets out the resources

available to the project, the work breakdown, and a schedule for carrying

out the work. The plan

should identify the approach that is taken to risk management as well as

risks to the project and the software under development. The details of

project plans vary depending on the type of project and organization but

plans normally include the following sections: 1. Introduction Briefly

describes the objectives of the project and sets out the constraints (e.g.,

budget, time) that affect the management of the project.

2. Project organization Describes the way in which the development team is

organized, the people involved, and their roles in the team.

23.2 Plan-driven development 673

Plan

Description

Configuration management plan

Describes the configuration management procedures and

structures to be used.

Deployment plan

Describes how the software and associated hardware (if required)

will be deployed in the customer’s environment. This should

include a plan for migrating data from existing systems.

Maintenance plan

Predicts the maintenance requirements, costs, and effort.

Quality plan

Describes the quality procedures and standards that will be

used in a project.

Validation plan

Describes the approach, resources, and schedule used for

system validation.

Figure 23.2 Project

plan supplements

3. Risk analysis Describes possible project risks, the likelihood of these

risks arising, and the risk reduction strategies (discussed in Chapter 22)

that are proposed.

4. Hardware and software resource requirements Specifies the hardware and

support software required to carry out the development. If hardware has

to be purchased,

estimates of the prices and the delivery schedule may be included.

5. Work breakdown Sets out the breakdown of the project into activities

and identifies the inputs to and the outputs from each project activity.

6. Project schedule Shows the dependencies between activities, the

estimated time required to reach each milestone, and the allocation of

people to activities. The

ways in which the schedule may be presented are discussed in the next

section

of the chapter.

7. Monitoring and reporting mechanisms Defines the management reports

that

should be produced, when these should be produced, and the project

monitoring

mechanisms to be used.

The main project plan should always include a project risk assessment and

a

schedule for the project. In addition, you may develop a number of

supplementary

plans for activities such as testing and configuration management. Figure

23.2 shows some supplementary plans that may be developed. These are

all usually needed in

large projects developing large, complex systems.

23.2.2 The planning process

Project planning is an iterative process that starts when you create an

initial project plan during the project startup phase. Figure 23.3 is a UML

activity diagram that

shows a typical workflow for a project planning process. Plan changes are

inevita-

ble. As more information about the system and the project team becomes

available

674 Chapter 23 Project planning

[project

«system»

[unfinished]

finished]

Project planner

Identify

constraints

Do the work

[ no problems ]

Identify

Define project

risks

schedule

Monitor progress

against plan

Define

[serious

milestones

problems]

and

[minor problems and slippages]

deliverables

Initiate risk

Replan

Figure 23.3 The project

mitigation actions

project

planning process

during the project, you should regularly revise the plan to reflect

requirements,

schedule, and risk changes. Changing business goals also leads to changes

in project plans. As business goals change, this could affect all projects,

which may then have to be re-planned.

At the beginning of a planning process, you should assess the constraints

affect-

ing the project. These constraints are the required delivery date, staff

available, overall budget, available tools, and so on. In conjunction with

this assessment, you

should also identify the project milestones and deliverables. Milestones are

points in the schedule against which you can assess progress, for example,

the handover of the system for testing. Deliverables are work products that

are delivered to the customer, for example, a requirements document for

the system.

The process then enters a loop that terminates when the project is

complete. You

draw up an estimated schedule for the project, and the activities defined

in the schedule are initiated or are approved to continue. After some time

(usually about two to three weeks), you should review progress and note

discrepancies from the planned schedule.

Because initial estimates of project parameters are inevitably approximate,

minor slippages are normal and you will have to make modifications to

the original plan.

You should make realistic rather than optimistic assumptions when you

are defin-

ing a project plan. Problems of some description always arise during a

project, and

these lead to project delays. Your initial assumptions and scheduling

should there-

fore be pessimistic and take unexpected problems into account. You

should include

contingency in your plan so that if things go wrong, then your delivery

schedule is

not seriously disrupted.

If there are serious problems with the development work that are likely to

lead to

significant delays, you need to initiate risk mitigation actions to reduce

the risks of project failure. In conjunction with these actions, you also

have to re-plan the project. This may involve renegotiating the project

constraints and deliverables with the customer. A new schedule of when

work should be completed also has to be established and agreed to with

the customer.

23.3 Project scheduling 675

If this renegotiation is unsuccessful or the risk mitigation actions are

ineffective, then you should arrange for a formal project technical review.

The objectives of this review are to find an alternative approach that will

allow the project to continue.

Reviews should also check that the customer’s goals are unchanged and

that the

project remains aligned with these goals.

The outcome of a review may be a decision to cancel a project. This may

be a

result of technical or managerial failings but, more often, is a consequence

of external changes that affect the project. The development time for a

large software project is often several years. During that time, the business

objectives and priorities inevitably change. These changes may mean that

the software is no longer required or

that the original project requirements are inappropriate. Management may

then

decide to stop software development or to make major changes to the

project to

reflect the changes in the organizational objectives.

23.3 Project scheduling

Project scheduling is the process of deciding how the work in a project

will be organized as separate tasks, and when and how these tasks will be

executed. You estimate

the calendar time needed to complete each task and the effort required,

and you sug-

gest who will work on the tasks that have been identified. You also have

to estimate the hardware and software resources that are needed to

complete each task. For

example, if you are developing an embedded system, you have to estimate

the time

that you need on specialized hardware and the costs of running a system

simulator.

In terms of the planning stages that I introduced in the introduction of this

chapter, an initial project schedule is usually created during the project

startup phase. This schedule is then refined and modified during

development planning.

Both plan-based and agile processes need an initial project schedule,

although less

detail is included in an agile project plan. This initial schedule is used to

plan how people will be allocated to projects and to check the progress of

the project against its contractual commitments. In traditional

development processes, the complete schedule is initially developed and

then modified as the project progresses. In agile processes, there has to be

an overall schedule that identifies when the major phases of the project

will be completed. An iterative approach to scheduling is then used to

plan each phase.

Scheduling in plan-driven projects (Figure 23.4) involves breaking down

the total

work involved in a project into separate tasks and estimating the time

required to

complete each task. Tasks should normally last at least a week and no

longer than

2 months. Finer subdivision means that a disproportionate amount of time

must be

spent on re-planning and updating the project plan. The maximum amount

of time

for any task should be 6 to 8 weeks. If a task will take longer than this, it

should be split into subtasks for project planning and scheduling.

Some of these tasks are carried out in parallel, with different people

working on

different components of the system. You have to coordinate these parallel

tasks and

organize the work so that the workforce is used optimally and you don’t

introduce

676 Chapter 23 Project planning

Identify

Identify activity

Estimate resources

Allocate people

Create project

activities

dependencies

for activities

to activities

charts

Software requirements

Bar charts describing the

and design information

project schedule

Figure 23.4 The project

scheduling process

unnecessary dependencies between the tasks. It is important to avoid a

situation

where the whole project is delayed because a critical task is unfinished.

If a project is technically advanced, initial estimates will almost certainly

be optimistic even when you try to consider all eventualities. In this

respect, software

scheduling is no different from scheduling any other type of large

advanced project.

New aircraft, bridges, and even new models of cars are frequently late

because of

unanticipated problems. Schedules, therefore, must be continually updated

as better

progress information becomes available. If the project being scheduled is

similar to a previous project, previous estimates may be reused. However,

projects may use

different design methods and implementation languages, so experience

from previ-

ous projects may not be applicable in the planning of a new project.

When you are estimating schedules, you must take into account the

possibility

that things will go wrong. People working on a project may fall ill or

leave, hardware may fail, and essential support software or hardware may

be delivered late. If the

project is new and technically advanced, parts of it may turn out to be

more difficult and take longer than originally anticipated.

A good rule of thumb is to estimate as if nothing will go wrong and then

increase

your estimate to cover anticipated problems. A further contingency factor

to cover

unanticipated problems may also be added to the estimate. This extra

contingency factor depends on the type of project, the process parameters

(deadline, standards, etc.), and the quality and experience of the software

engineers working on the project. Contingency estimates may add 30 to

50% to the effort and time required for the project.

23.3.1 Schedule presentation

Project schedules may simply be documented in a table or spreadsheet

showing the

tasks, estimated effort, duration, and task dependencies (Figure 23.5).

However, this style of presentation makes it difficult to see the

relationships and dependencies

between the different activities. For this reason, alternative graphical

visualizations of project schedules have been developed that are often

easier to read and understand. Two types of visualization are commonly

used:

1. Calendar-based bar charts show who is responsible for each activity, the

expected elapsed time, and when the activity is scheduled to begin and

end. Bar

charts are also called Gantt charts, after their inventor, Henry Gantt.

23.3 Project scheduling 677

Task

Effort (person-days)

Duration (days)

Dependencies

T1

15

10

T2

8

15

T3

20

15

T1 (M1)

T4

5

10

T5

5

10

T2, T4 (M3)

T6

10

5

T1, T2 (M4)

T7

25

20

T1 (M1)

T8

75

25

T4 (M2)

T9

10

15

T3, T6 (M5)

T10

20

15

T7, T8 (M6)

T11

10

10

T9 (M7)

T12

20

10

T10, T11 (M8)

Figure 23.5 Tasks,

durations, and

dependencies

2. Activity networks show the dependencies between the different

activities mak-

ing up a project. These networks are described in an associated web

section.

Project activities are the basic planning element. Each activity has:

a duration in calendar days or months;

an effort estimate, which shows the number of person-days or person-

months to

complete the work;

a deadline by which the activity should be complete; and

a defined endpoint, which might be a document, the holding of a

review meeting,

the successful execution of all tests, or the like.

When planning a project, you may decide to define project milestones. A

mile-

stone is a logical end to a stage of the project where the progress of the

work can

be reviewed. Each milestone should be documented by a brief report

(often sim-

ply an email) that summarizes the work done and whether or not the work

has

been completed as planned. Milestones may be associated with a single

task or

with groups of related activities. For example, in Figure 23.5, milestone

M1 is

associated with task T1 and marks the end of that activity. Milestone M3

is asso-

ciated with a pair of tasks T2 and T4; there is no individual milestone at

the end

of these tasks.

678 Chapter 23 Project planning

Activity charts

An activity chart is a project schedule representation that presents the

project plan as a directed graph. It shows which tasks can be carried out in

parallel and those that must be executed in sequence due to their

dependencies on earlier activities. If a task is dependent on several other

tasks, then all of these tasks must be completed before it can start. The

“critical path” through the activity chart is the longest sequence of

dependent tasks. This defines the project duration.

http://software-engineering-book.com/web/planning-activities/

Some activities create project deliverables—outputs that are delivered to

the

software customer. Usually, the deliverables that are required are specified

in the

project contract, and the customer’s view of the project’s progress depends

on

these deliverables. Milestones and deliverables are not the same thing.

Milestones

are short reports that are used for progress reporting, whereas deliverables

are

more substantial project outputs such as a requirements document or the

initial

implementation of a system.

Figure 23.5 shows a hypothetical set of tasks, their estimated effort and

duration,

and task dependencies. From this table, you can see that task T3 is

dependent on task T1. This means that task T1 has to be completed before

T3 starts. For example, T1

might be the selection of a system for reuse and T3, the configuration of

the selected system. You can’t start system configuration until you have

chosen and installed the application system to be modified.

Notice that the estimated duration for some tasks is more than the effort

required

and vice versa. If the effort is less than the duration, the people allocated

to that task are not working full time on it. If the effort exceeds the

duration, this means that several team members are working on the task

at the same time.

Figure 23.6 takes the information in Figure 23.5 and presents the project

sched-

ule as a bar chart showing a project calendar and the start and finish dates

of tasks.

Reading from left to right, the bar chart clearly shows when tasks start

and end. The milestones (M1, M2, etc.) are also shown on the bar chart.

Notice that tasks that are independent may be carried out in parallel. For

example, tasks T1, T2, and T4 all

start at the beginning of the project.

As well as planning the delivery schedule for the software, project

managers have

to allocate resources to tasks. The key resource is, of course, the software

engineers who will do the work. They have to be assigned to project

activities. The resource

allocation can be analyzed by project management tools, and a bar chart

can be generated showing when staff are working on the project (Figure

23.7). People may be

working on more than one task at the same time, and sometimes they are

not working

on the project. They may be on holiday, working on other projects, or

attending training courses. I show part-time assignments using a diagonal

line crossing the bar.

Large organizations usually employ a number of specialists who work on a

pro-

ject when needed. In Figure 23.7, you can see that Mary is a specialist

who works on

23.3 Project scheduling 679

Week 0

1

2

3

4

5

6

7

8

9

10

11

Start

T1

T2

(M1/T1)

T3

T4

(M3/T2 & T4)

T5

(M4/T1& T2)

T6

T7

(M2/T4)

T8

(M5/T3 & T6)

T9

(M6/T7 & T8)

T10

(M7/T 9)

T11

(M8/T10 & T11)

T12

Finish

Figure 23.6 Activity

bar chart

only a single task (T5) in the project. The use of specialists is unavoidable

when

complex systems are being developed, but it can lead to scheduling

problems. If one

project is delayed while a specialist is working on it, this may affect other

projects where the specialist is also required. These projects may be

delayed because the

specialist is not available.

If a task is delayed, later tasks that are dependent on it may be affected.

They cannot start until the delayed task is completed. Delays can cause

serious problems with staff allocation, especially when people are working

on several projects at the same time. If a task (T) is delayed, the people

allocated to it may be assigned to other work (W). To complete this work

may take longer than the delay, but, once assigned, they

cannot simply be reassigned back to the original task. This may then lead

to further delays in T as they complete W.

Normally, you should use a project planning tool, such as the Basecamp or

Microsoft project, to create, update, and analyze project schedule

information.

Project management tools usually expect you to input project information

into a

table, and they create a database of project information. Bar charts and

activity charts can then be generated automatically from this database.

680 Chapter 23 Project planning

Week 0

1

2

3

4

5

6

7

8

9

10

11

Jane

T1

T3

T9

T10

T12

Ali

T1

T8

Geetha T2

T6

T7

T10

T3

Maya

T8

Fred

T4

T8

T11

T12

Mary

T5

Hong

T7

T6

Figure 23.7 Staff

allocation chart

23.4 Agile planning

Agile methods of software development are iterative approaches where

the software

is developed and delivered to customers in increments. Unlike plan-driven

approaches, the functionality of these increments is not planned in

advance but is

decided during the development. The decision on what to include in an

increment

depends on progress and on the customer’s priorities. The argument for

this approach is that the customer’s priorities and requirements change, so

it makes sense to have a flexible plan that can accommodate these

changes. Cohn’s book (Cohn 2005) is an

excellent introduction to agile planning.

Agile development methods such as Scrum (Rubin 2013) and Extreme

Programming (Beck and Andres 2004) have a two-stage approach to

planning, corre-

sponding to the startup phase in plan-driven development and

development planning:

1. Release planning, which looks ahead for several months and decides on

the features that should be included in a release of a system.

2. Iteration planning, which has a shorter term outlook and focuses on

planning the next increment of a system. This usually represents 2 to 4

weeks of work for the team.

I have already explained the Scrum approach to planning in Chapter 3,

which is

based on project backlogs and daily reviews of work to be done. It is

primarily geared

23.4 Agile planning 681

Story

Initial

Release

Iteration

Task

identification

estimation

planning

planning

planning

Figure 23.8 The

“planning game”

to iteration planning. Another approach to agile planning, which was

developed as

part of Extreme Programming, is based on user stories. The so-called

planning game

can be used in both release planning and iteration planning.

The basis of the planning game (Figure 23.8) is a set of user stories (see

Chapter 3) that cover all of the functionality to be included in the final

system. The development team and the software customer work together

to develop these stories. The team

members read and discuss the stories and rank them based on the amount

of time they

think it will take to implement the story. Some stories may be too large to

implement in a single iteration, and these are broken down into smaller

stories.

The problem with ranking stories is that people often find it difficult to

estimate how much effort or time is needed to do something. To make this

easier, relative ranking may be used. The team compares stories in pairs

and decides which will take the most time and effort, without assessing

exactly how much effort will be required. At the end of this process, the

list of stories has been ordered, with the stories at the top of the list taking

the most effort to implement. The team then allocates notional effort

points to all of the stories in the list. A complex story may have 8 points

and a simple story 2 points.

Once the stories have been estimated, the relative effort is translated into

the first estimate of the total effort required by using the idea of

“velocity.” Velocity is the number of effort points implemented by the

team, per day. This can be estimated

either from previous experience or by developing one or two stories to see

how

much time is required. The velocity estimate is approximate but is refined

during the development process. Once you have a velocity estimate, you

can calculate the total

effort in person-days to implement the system.

Release planning involves selecting and refining the stories that will

reflect the

features to be implemented in a release of a system and the order in which

the stories should be implemented. The customer has to be involved in

this process. A release

date is then chosen, and the stories are examined to see if the effort

estimate is consistent with that date. If not, stories are added or removed

from the list.

Iteration planning is the first stage in developing a deliverable system

increment.

Stories to be implemented during that iteration are chosen, with the

number of stories reflecting the time to deliver an workable system

(usually 2 or 3 weeks) and the team’s velocity. When the delivery date is

reached, the development iteration is complete,

even if all of the stories have not been implemented. The team considers

the stories that have been implemented and adds up their effort points.

The velocity can then be recalculated, and this measure is used in

planning the next version of the system.

At the start of each development iteration, there is a task planning stage

where the developers break down stories into development tasks. A

development task should

take 4–16 hours. All of the tasks that must be completed to implement all

of the stories in that iteration are listed. The individual developers then

sign up for the specific

682 Chapter 23 Project planning

tasks that they will implement. Each developer knows their individual

velocity and

so should not sign up for more tasks than they can implement in the time

allotted.

This approach to task allocation has two important benefits:

1. The whole team gets an overview of the tasks to be completed in an

iteration.

They therefore have an understanding of what other team members are

doing

and who to talk to if task dependencies are identified.

2. Individual developers choose the tasks to implement; they are not

simply allo-

cated tasks by a project manager. They therefore have a sense of

ownership in

these tasks, and this is likely to motivate them to complete the task.

Halfway through an iteration, progress is reviewed. At this stage, half of

the story effort points should have been completed. So, if an iteration

involves 24 story points and 36 tasks, 12 story points and 18 tasks should

have been completed. If this is not the case, then there has to be

discussions with the customer about which stories

should be removed from the system increment that is being developed.

This approach to planning has the advantage that a software increment is

always

delivered at the end of each project iteration. If the features to be included

in the increment cannot be completed in the time allowed, the scope of

the work is reduced.

The delivery schedule is never extended. However, this can cause

problems as it

means that customer plans may be affected. Reducing the scope may

create extra

work for customers if they have to use an incomplete system or change the

way they

work between one release of the system and another.

A major difficulty in agile planning is that it relies on customer

involvement and

availability. This involvement can be difficult to arrange, as customer

representa-

tives sometimes have to prioritize other work and are not available for the

planning game. Furthermore, some customers may be more familiar with

traditional project

plans and may find it difficult to engage in an agile planning process.

Agile planning works well with small, stable development teams that can

get

together and discuss the stories to be implemented. However, where teams

are large

and/or geographically distributed, or when team membership changes

frequently, it

is practically impossible for everyone to be involved in the collaborative

planning

that is essential for agile project management. Consequently, large

projects are usually planned using traditional approaches to project

management.

23.5 Estimation techniques

Estimating project schedules is difficult. You have to make initial

estimates on the basis of an incomplete user requirements definition. The

software may have to run on unfamiliar platforms or use new

development technology. The people involved in the

project and their skills will probably not be known. There are so many

uncertainties that it is impossible to estimate system development costs

accurately during the early

23.5 Estimation techniques 683

4 x

2 x

x

Feasibility Requirements

Design

Code

Delivery

0.5 x

Figure 23.9 Estimate

uncertainty

0.25 x

stages of a project. Nevertheless, organizations need to make software

effort and cost estimates. Two types of techniques can be used for making

estimates:

1. Experience-based techniques The estimate of future effort requirements is

based on the manager’s experience of past projects and the application

domain.

Essentially, the manager makes an informed judgment of what the effort

require-

ments are likely to be.

2. Algorithmic cost modeling In this approach, a formulaic approach is used

to compute the project effort based on estimates of product attributes,

such as size,

process characteristics, and experience of staff involved.

In both cases, you need to use your judgment to estimate either the effort

directly

or the project and product characteristics. In the startup phase of a

project, these estimates have a wide margin of error. Based on data

collected from a large number

of projects, Boehm et al. (B. Boehm et al. 1995) discovered that startup

estimates

vary significantly. If the initial estimate of effort required is x months of

effort, they found that the range may be from 0.25 x to 4 x of the actual

effort as measured when the system was delivered. During development

planning, estimates become more

and more accurate as the project progresses (Figure 23.9).

Experience-based techniques rely on the manager’s experience of past

projects

and the actual effort expended in these projects on activities that are

related to software development. Typically, you identify the deliverables

to be produced in a pro-

ject and the different software components or systems that are to be

developed. You

document these in a spreadsheet, estimate them individually, and compute

the total

effort required. It usually helps to get a group of people involved in the

effort estimation and to ask each member of the group to explain their

estimate. This often

reveals factors that others have not considered, and you then iterate

toward an

agreed group estimate.

684 Chapter 23 Project planning

The difficulty with experience-based techniques is that a new software

project

may not have much in common with previous projects. Software

development

changes very quickly, and a project will often use unfamiliar techniques

such as web services, application system configuration, or HTML5. If you

have not worked with

these techniques, your previous experience may not help you to estimate

the effort

required, making it more difficult to produce accurate costs and schedule

estimates.

It is impossible to say whether experience-based or algorithmic

approaches are

more accurate. Project estimates are often self-fulfilling. The estimate is

used to

define the project budget, and the product is adjusted so that the budget

figure is realized. A project that is within budget may have achieved this

at the expense of fea-

tures in the software being developed.

To make a comparison of the accuracy of these techniques, a number of

controlled

experiments would be required where several techniques were used

independently to

estimate the project effort and costs. No changes to the project would be

allowed, and the final effort could them be compared. The project

manager would not know the effort

estimates, so no bias would be introduced. However, this scenario is

completely impossible in real projects, so we will never have an objective

comparison of these approaches.

23.5.1 Algorithmic cost modeling

Algorithmic cost modeling uses a mathematical formula to predict project

costs

based on estimates of the project size, the type of software being

developed, and

other team, process, and product factors. Algorithmic cost models are

developed by

analyzing the costs and attributes of completed projects, then finding the

closest-fit formula to the actual costs incurred.

Algorithmic cost models are primarily used to make estimates of software

devel-

opment costs. However, Boehm and his collaborators (B. W. Boehm et al.

2000)

discuss a range of other uses for these models, such as the preparation of

estimates for investors in software companies, alternative strategies to

help assess risks and to inform decisions about reuse, redevelopment, or

outsourcing.

Most algorithmic models for estimating effort in a software project are

based on a

simple formula:

Effort = A 3 SizeB 3 M

A: a constant factor, which depends on local organizational practices and

the type

of software that is developed.

Size: an assessment of the code size of the software or a functionality

estimate

expressed in function or application points.

B: represents the complexity of the software and usually lies between 1

and 1.5.

M: is a factor that takes into account process, product and development

attributes,

such as the dependability requirements for the software and the

experience of the

development team. These attributes may increase or decrease the overall

diffi-

culty of developing the system.

23.5 Estimation techniques 685

The number of lines of source code (SLOC) in the delivered system is the

funda-

mental size metric that is used in many algorithmic cost models. To

estimate the

number of lines of code in a system, you may use a combination of

approaches:

1. Compare the system to be developed with similar systems and use their

code

size as the basis for your estimate.

2. Estimate the number of function or application points in the system (see

the following section) and formulaically convert these to lines of code in

the program-

ming language used.

3. Rank the system components using judgment of their relative sizes and

use a

known reference component to translate this ranking to code sizes.

Most algorithmic estimation models have an exponential component (B in

the

above equation) that increases with the size and complexity of the system.

This

reflects the fact that costs do not usually increase linearly with project

size. As the size and complexity of the software increase, extra costs are

incurred because of the communication overhead of larger teams, more

complex configuration management,

more difficult system integration, and so on. The more complex the

system, the more

these factors affect the cost.

The idea of using a scientific and objective approach to cost estimation is

an

attractive one, but all algorithmic cost models suffer from two key

problems:

1. It is practically impossible to estimate Size accurately at an early stage

in a project, when only the specification is available. Function-point and

application-

point estimates (see later) are easier to produce than estimates of code size

but

are also usually inaccurate.

2. The estimates of the complexity and process factors contributing to B

and M are

subjective. Estimates vary from one person to another, depending on their

back-

ground and experience of the type of system that is being developed.

Accurate code size estimation is difficult at an early stage in a project

because the size of the final program depends on design decisions that

may not have been made

when the estimate is required. For example, an application that requires

high-performance data management may either implement its own data

management system or use a

commercial database system. In the initial cost estimation, you are

unlikely to know if there is a commercial database system that performs

well enough to meet the performance requirements. You therefore don’t

know how much data management

code will be included in the system.

The programming language used for system development also affects the

number

of lines of code to be developed. A language like Java might mean that

more lines of code are necessary than if C (say) was used. However, this

extra code allows more

compile-time checking, so validation costs are likely to be reduced. It is

not clear how this should be taken into account in the estimation process.

Code reuse also

686 Chapter 23 Project planning

Software productivity

Software productivity is an estimate of the average amount of

development work that software engineers complete in a week or a month.

It is therefore expressed as lines of code/month, function points/month,

and so forth.

However, while productivity can be easily measured where there is a

tangible outcome (e.g., an administrator processes N travel claims/day),

software productivity is more difficult to define. Different people may

implement the same functionality in different ways, using different

numbers of lines of code. The quality of the code is also important but is,

to some extent, subjective. Therefore, you can’t really compare the

productivity of individual engineers. It only makes sense to use

productivity measures with large groups.

http://software-engineering-book.com/web/productivity/

makes a difference, and some models explicitly estimate the number of

lines of code

reused. However, if application systems or external services are reused, it

is very

difficult to compute the number of lines of source code that these replace.

Algorithmic cost models are a systematic way to estimate the effort

required to

develop a system. However, these models are complex and difficult to use.

There are

many attributes and considerable scope for uncertainty in estimating their

values.

This complexity means that the practical application of algorithmic cost

modeling

has been limited to a relatively small number of large companies, mostly

working in

defense and aerospace systems engineering.

Another barrier that discourages the use of algorithmic models is the need

for

calibration. Model users should calibrate their model and the attribute

values

using their own historical project data, as this reflects local practice and

experi-

ence. However, very few organizations have collected enough data from

past pro-

jects in a form that supports model calibration. Practical use of

algorithmic

models, therefore, has to start with the published values for the model

parameters.

It is practically impossible for a modeler to know how closely these relate

to his

or her organization.

If you use an algorithmic cost estimation model, you should develop a

range of

estimates (worst, expected, and best) rather than a single estimate and

apply the

costing formula to all of them. Estimates are most likely to be accurate

when you

understand the type of software that is being developed and have

calibrated the costing model using local data, or when programming

language and hardware choices

are predefined.

23.6 COCOMO cost modeling

The best known algorithmic cost modeling technique and tool is the

COCOMO II

model. This empirical model was derived by collecting data from a large

number of

software projects of different sizes. These data were analyzed to discover

the formulas that were the best fit to the observations. These formulas

linked the size of the

23.6 COCOMO cost modeling 687

Systems developed

Number of

Based on

Application

Used for

using dynamic

application points

composition model

languages, DB

programming etc.

Initial effort

Number of function

Based on

Used for

estimation based on

points

Early design model

system requirements

and design options

Number of lines of

Based on

Used for

Effort to integrate

code reused or

Reuse model

reusable components

generated

or automatically

generated code

Based on

Used for

Number of lines of

Post-architecture

Development effort

source code

model

based on system

design specification

Figure 23.10 COCOMO

estimation models

system and product, project, and team factors to the effort to develop the

system.

COCOMO II is a freely available model that is supported with open-source

tools.

COCOMO II was developed from earlier COCOMO (Constructive Cost

Modeling) cost estimation models, which were largely based on original

code devel-

opment (B. W. Boehm 1981; B. Boehm and Royce 1989). The COCOMO II

model

takes into account modern approaches to software development, such as

rapid devel-

opment using dynamic languages, development with reuse, and database

program-

ming. COCOMO II embeds several submodels based on these techniques,

which

produce increasingly detailed estimates.

The submodels (Figure 23.10) that are part of the COCOMO II model are:

1. An application composition model This models the effort required to

develop systems that are created from reusable components, scripting, or

database programming. Software size estimates are based on application

points, and a simple

size/productivity formula is used to estimate the effort required.

2. An early design model This model is used during early stages of the

system design after the requirements have been established. The estimate

is based on the

standard estimation formula that I discussed in the introduction of this

chapter,

with a simplified set of seven multipliers. Estimates are based on function

points,

which are then converted to number of lines of source code.

Function points are a language-independent way of quantifying program

func-

tionality. You compute the total number of function points in a program

by

measuring or estimating the number of external inputs and outputs, user

interac-

tions, external interfaces, and files or database tables used by the system.

688 Chapter 23 Project planning

3. A reuse model This model is used to compute the effort required to

integrate reusable components and/or automatically generated program

code. It is normally used in conjunction with the post-architecture model.

4. A post-architecture model Once the system architecture has been

designed, a more accurate estimate of the software size can be made.

Again, this model uses

the standard formula for cost estimation discussed above. However, it

includes

a more extensive set of 17 multipliers reflecting personnel capability,

product,

and project characteristics.

Of course, in large systems, different parts of the system may be developed

using

different technologies, and you may not have to estimate all parts of the

system to

the same level of accuracy. In such cases, you can use the appropriate

submodel for

each part of the system and combine the results to create a composite

estimate.

The COCOMO II model is a very complex model and, to make it easier to

explain,

I have simplified its presentation. You could use the models as I have

explained them here for simple cost estimation. However, to use COCOMO

properly, you should refer

to Boehm’s book and the manual for the COCOMO II model (B. W. Boehm

et al.

2000; Abts et al. 2000).

23.6.1 The application composition model

The application composition model was introduced into COCOMO II to

support

the estimation of effort required for prototyping projects and for projects

where

the software is developed by composing existing components. It is based

on an

estimate of weighted application points (sometimes called object points),

divided

by a standard estimate of application point productivity (B. W. Boehm et

al.

2000). The number of application points in a program is derived from four

sim-

pler estimates:

the number of separate screens or web pages that are displayed;

the number of reports that are produced;

the number of modules in imperative programming languages (such as

Java); and

the number of lines of scripting language or database programming

code.

This estimate is then adjusted according to the difficulty of developing

each

application point. Productivity depends on the developer’s experience and

capability as well as the capabilities of the software tools (ICASE) used to

support development. Figure 23.11 shows the levels of application-point

productivity suggested by

the COCOMO model developers.

Application composition usually relies on reusing existing software and

configur-

ing application systems. Some of the application points in the system will

therefore be implemented using reusable components. Consequently, you

have to adjust the

23.6 COCOMO cost modeling 689

Developer’s

Very low

Low

Nominal

High

Very high

experience and

capability

ICASE maturity and

Very low

Low

Nominal

High

Very high

capability

PROD (NAP/month)

4

7

13

25

50

Figure 23.11

estimate to take into account the percentage of reuse expected. Therefore,

the final Application-point productivity

formula for effort computation for system prototypes is:

PM 5 (NAP 3 (1 2 %reuse/100)) / PROD

PM: the effort estimate in person-months.

NAP: the total number of application points in the delivered system.

%reuse: an estimate of the amount of reused code in the development.

PROD: the application-point productivity as shown in Figure 23.11.

23.6.2 The early design model

This model may be used during the early stages of a project, before a

detailed architectural design for the system is available. The early design

model assumes that user requirements have been agreed and initial stages

of the system design process are

underway. Your goal at this stage should be to make a quick and

approximate cost

estimate. Therefore, you have to make simplifying assumptions, such as

the assump-

tion that there is no effort involved in integrating reusable code.

Early design estimates are most useful for option exploration where you

need to

compare different ways of implementing the user requirements. The

estimates pro-

duced at this stage are based on the standard formula for algorithmic

models, namely: Effort 5 A 3 SizeB 3 M

Based on his own large dataset, Boehm proposed that the co-efficient A

should be

2.94. The size of the system is expressed in KSLOC, which is the number of

thou-

sands of lines of source code. You calculate KSLOC by estimating the

number of

function points in the software. You then use standard tables, which relate

software size to function points for different programming languages

(QSM 2014) to compute

an initial estimate of the system size in KSLOC.

The exponent B reflects the increased effort required as the size of the

project

increases. This can vary from 1.1 to 1.24 depending on the novelty of the

project, the development flexibility, the risk resolution processes used, the

cohesion of the

development team, and the process maturity level (see web Chapter 26) of

the organ-

ization. I discuss how the value of this exponent is calculated using these

parameters in the description of the COCOMO II post-architecture model.

690 Chapter 23 Project planning

This results in an effort computation as follows:

PM 5 2.94 3 Size(1.1 to 1.24) 3 M

M 5 PERS 3 PREX 3 RCPX 3 RUSE 3 PDIF 3 SCED 3 FSIL

PERS: personnel capability

PREX: personnel experience

RCPX: product reliability and complexity

RUSE: reuse required

PDIF: platform difficulty

SCED: schedule

FSIL: support facilities

The multiplier M is based on seven project and process attributes that

increase or

decrease the estimate. I explain these attributes on the book’s web pages.

You esti-

mate values for these attributes using a six-point scale, where 1

corresponds to “very low” and 6 corresponds to “very high”; for example,

PERS = 6 means that expert

staff are available to work on the project.

23.6.3 The reuse model

The COCOMO reuse model is used to estimate the effort required to

integrate reus-

able or generated code. As I have discussed in Chapter 15, software reuse

is now the norm in all software development. Most large systems include a

significant amount

of code that has been reused from previous development projects.

COCOMO II considers two types of reused code. Black-box code is code

that can be

reused without understanding the code or making changes to it. Examples

of black-box code are components that are automatically generated from

UML models or application

libraries such as graphics libraries. It is assumed that the development

effort for black-box code is zero. Its size is not taken into account in the

overall effort computation.

White-box code is reusable code that has to be adapted to integrate it with

new

code or other reused components. Development effort is required for reuse

because

the code has to be understood and modified before it can work correctly in

the sys-

tem. White-box code could be automatically generated code that needs

manual

changes or additions. Alternatively, it can be reused components from

other systems

that have to be modified in the system that is being developed.

Three factors contribute to the effort involved in reusing white-box code

components: 1. The effort involved in assessing whether or not a

component could be reused in

a system that is being developed.

2. The effort required to understand the code that is being reused.

3. The effort required to modify the reused code to adapt it and integrate

it with the system being developed.

23.6 COCOMO cost modeling 691

The development effort in the reuse model is calculated using the

COCOMO

early design model and is based on the total number of lines of code in the

system.

The code size includes new code developed for components that are not

reused plus

an additional factor that allows for the effort involved in reusing and

integrating

existing code. This additional factor is called ESLOC, the equivalent

number of lines of new source code. That is, you express the reuse effort

as the effort that would be involved in developing some additional source

code.

The formula used to calculate the source code equivalence is:

ESLOC 5 (ASLOC 3 (1-AT/100) 3 AAM)

ESLOC: the equivalent number of lines of new source code.

ASLOC: an estimate of the number of lines of code in the reused

components that

have to be changed.

AT: the percentage of reused code that can be modified automatically.

AAM: an Adaptation Adjustment Multiplier that reflects the additional

effort

required to reuse components.

In some cases, the adjustments required to reuse code are syntactic and

can be

implemented by an automated tool. These do not involve significant

effort, so you

should estimate what fraction of the changes made to reused code can be

automated

(AT). This reduces the total number of lines of code that have to be

adapted.

The Adaptation Adjustment Multiplier (AAM) adjusts the estimate to

reflect the

additional effort required to reuse code. The COCOMO model

documentation (Abts

et al. 2000) discusses in detail how AAM should be calculated.

Simplistically, AAM is the sum of three components:

1. An assessment factor (referred to as AA) that represents the effort

involved in deciding whether or not to reuse components. AA varies from

0 to 8 depending

on the amount of time you need to spend looking for and assessing

potential

candidates for reuse.

2. An understanding component (referred to as SU) that represents the

costs of

understanding the code to be reused and the familiarity of the engineer

with the

code that is being reused. SU ranges from 50 for complex, unstructured

code to

10 for well-written, object-oriented code.

3. An adaptation component (referred to as AAF) that represents the costs

of making changes to the reused code. These include design, code, and

integration changes.

Once you have calculated a value for ESLOC, you apply the standard

estimation

formula to calculate the total effort required, where the Size parameter =

ESLOC.

Therefore, the formula to estimate the reuse effort is:

Effort 5 A 3 ESLOCB 3 M

where A, B, and M have the same values as used in the early design

model.

692 Chapter 23 Project planning

COCOMO cost drivers

COCOMO II cost drivers are attributes that reflect some of the product,

team, process, and organizational factors that affect the amount of effort

needed to develop a software system. For example, if a high level of

reliability is required, extra effort will be needed; if there is a need for

rapid delivery, extra effort will be required; if the team members change,

extra effort will be required.

There are 17 of these attributes in the COCOMO II model, which have

been assigned estimated values by the model developers.

http://software-engineering-book.com/web/cost-drivers/

23.6.4 The post-architecture level

The post-architecture model is the most detailed of the COCOMO II

models. It is

used when you have an initial architectural design for the system. The

starting point for estimates produced at the post-architecture level is the

same basic formula used in the early design estimates:

PM 5 A 3 SizeB 3 M

By this stage in the process, you should be able to make a more accurate

estimate

of the project size, as you know how the system will be decomposed into

subsystems

and components. You make this estimate of the overall code size by

adding three

code size estimates:

1. An estimate of the total number of lines of new code to be developed

(SLOC).

2. An estimate of the reuse costs based on an equivalent number of source

lines of

code (ESLOC), calculated using the reuse model.

3. An estimate of the number of lines of code that may be changed

because of

changes to the system requirements.

The final component in the estimate—the number of lines of modified

code—

reflects the fact that software requirements always change. This leads to

rework and development of extra code, which you have to take into

account. Of course there will often be even more uncertainty in this figure

than in the estimates of new code to be developed.

The exponent term (B) in the effort computation formula is related to the

lev-

els of project complexity. As projects become more complex, the effects of

increasing system size become more significant. The value of the exponent

B is

based on five factors, as shown in Figure 23.12. These factors are rated on

a six-

point scale from 0 to 5, where 0 means “extra high” and 5 means “very

low.” To

calculate B, you add the ratings, divide them by 100, and add the result to

1.01 to

get the exponent that should be used.

23.6 COCOMO cost modeling 693

Scale factor

Explanation

Architecture/risk resolution

Reflects the extent of risk analysis carried out. Very low means little

analysis;

extra-high means a complete and thorough risk analysis.

Development flexibility

Reflects the degree of flexibility in the development process. Very low

means

a prescribed process is used; extra-high means that the client sets only

general goals.

Precedentedness

Reflects the previous experience of the organization with this type of

project.

Very low means no previous experience; extra-high means that the

organization is completely familiar with this application domain.

Team cohesion

Reflects how well the development team knows each other and works

together. Very low means very difficult interactions; extra-high means an

integrated and effective team with no communication problems.

Process maturity

Reflects the process maturity of the organization as discussed in web

chapter 26. The computation of this value depends on the CMM Maturity

Questionnaire, but an estimate can be achieved by subtracting the CMM

process maturity level from 5.

Figure 23.12 Scale

factors used in the

For example, imagine that an organization is taking on a project in a

domain in which exponent computation it has little previous experience.

The project client has not defined the process to be used or in the post-

architecture

model

allowed time in the project schedule for significant risk analysis. A new

development team must be put together to implement this system. The

organization has recently put in place a process improvement program

and has been rated as a Level 2 organization according to the SEI

capability assessment, as discussed in Chapter 26 (web chapter). These

characteristics lead to estimates of the ratings used in exponent calculation

as follows:

1. Precedentedness, rated low (4). This is a new project for the

organization.

2. Development flexibility, rated very high (1). There is no client

involvement in the development process, so there are few externally

imposed changes.

3. Architecture/risk resolution, rated very low (5). There has been no risk

analysis carried out.

4. Team cohesion, rated nominal (3). This is a new team, so there is no

information available on cohesion.

5. Process maturity, rated nominal (3). Some process control is in place.

The sum of these values is 16. You then calculate the final value of the

exponent

by dividing this sum by 100 and adding 0.01 to the result. The adjusted

value of B is therefore 1.17.

The overall effort estimate is refined using an extensive set of 17 product,

pro-

cess, and organizational attributes (see breakout box) rather than the

seven attributes used in the early design model. You can estimate values

for these attributes because you have more information about the software

itself, its non-functional requirements, the development team, and the

development process.

694 Chapter 23 Project planning

Exponent value

1.17

System size (including factors for reuse and

128 KLOC

requirements volatility)

Initial COCOMO estimate without cost drivers

730 person-months

Reliability

Very high, multiplier = 1.39

Complexity

Very high, multiplier = 1.3

Memory constraint

High, multiplier = 1.21

Tool use

Low, multiplier = 1.12

Schedule

Accelerated, multiplier = 1.29

Adjusted COCOMO estimate

2306 person-months

Reliability

Very low, multiplier = 0.75

Complexity

Very low, multiplier = 0.75

Memory constraint

None, multiplier = 1

Tool use

Very high, multiplier = 0.72

Figure 23.13

Schedule

Normal, multiplier = 1

The effect of cost

drivers on effort

Adjusted COCOMO estimate

295 person-months

estimates

Figure 23.13 shows how the cost driver attributes influence effort

estimates.

Assume that the exponent value is 1.17 as discussed in the above example.

Reliability (RELY), complexity (CPLX), storage (STOR), tools (TOOL), and

schedule (SCED) are

the key cost drivers in the project. All of the other cost drivers have a

nominal value of 1, so they do not affect the effort computation.

In Figure 23.13, I have assigned maximum and minimum values to the key

cost

drivers to show how they influence the effort estimate. The values used

are those

from the COCOMO II reference manual (Abts et al. 2000). You can see

that high

values for the cost drivers lead an effort estimate that is more than three

times the initial estimate, whereas low values reduce the estimate to about

one third of the

original. This highlights the significant differences between different types

of

project and the difficulties of transferring experience from one application

domain

to another.

23.6.5 Project duration and staffing

As well as estimating the overall costs of a project and the effort that is

required to develop a software system, project managers must also

estimate how long the software will take to develop and when staff will be

needed to work on the project.

Increasingly, organizations are demanding shorter development schedules

so that

their products can be brought to market before their competitor’s.

23.6 COCOMO cost modeling 695

The COCOMO model includes a formula to estimate the calendar time

required

to complete a project:

TDEV 5 3 3 (PM)(0.33 1 0.2*(B 2 1.01))

TDEV: the nominal schedule for the project, in calendar months, ignoring

any mul-

tiplier that is related to the project schedule.

PM: the effort computed by the COCOMO model.

B: a complexity-related exponent, as discussed in section 23.5.2.

If B 5 1.17 and PM = 60 then

TDEV 5 3 3 (60)0.36 5 13 months

The nominal project schedule predicted by the COCOMO model does not

neces-

sarily correspond with the schedule required by the software customer.

You may

have to deliver the software earlier or (more rarely) later than the date

suggested by the nominal schedule. If the schedule is to be compressed

(i.e., software is to be

developed more quickly), this increases the effort required for the project.

This is taken into account by the SCED multiplier in the effort estimation

computation.

Assume that a project estimated TDEV as 13 months, as suggested above,

but the

actual schedule required was 10 months. This represents a schedule

compression of

approximately 25%. Using the values for the SCED multiplier as derived

by Boehm’s

team, we see that the effort multiplier for this level of schedule

compression is 1.43.

Therefore, the actual effort that will be required if this accelerated

schedule is to be met is almost 50% more than the effort required to

deliver the software according to the nominal schedule.

There is a complex relationship between the number of people working on

a pro-

ject, the effort that will be devoted to the project. and the project delivery

schedule.

If four people can complete a project in 13 months (i.e., 52 person-months

of effort), then you might think that by adding one more person, you

could complete the work

in 11 months (55 person-months of effort). However, the COCOMO model

suggests

that you will, in fact, need six people to finish the work in 11 months (66

person-

months of effort).

The reason for this is that adding people to a project reduces the

productivity of

existing team members. As the project team increases in size, team

members spend

more time communicating and defining interfaces between the parts of the

system

developed by other people. Doubling the number of staff (for example)

therefore

does not mean that the duration of the project will be halved.

Consequently, when you add an extra person, the actual increment of

effort added

is less than one person as others become less productive. If the

development team is large, adding more people to a project sometimes

increases rather than reduces the

development schedule because of the overall effect on productivity.

You cannot simply estimate the number of people required for a project

team by

dividing the total effort by the required project schedule. Usually, a small

number of people are needed at the start of a project to carry out the

initial design. The team then

696 Chapter 23 Project planning

builds up to a peak during the development and testing of the system, and

then declines in size as the system is prepared for deployment. A very

rapid build-up of project staff has been shown to correlate with project

schedule slippage. As a project manager, you should therefore avoid

adding too many staff to a project early in its lifetime.

K e y P o i n t s

The price charged for a system does not just depend on its estimated

development costs and the profit required by the development company.

Organizational factors may mean that the price is increased to compensate

for increased risk or decreased to gain competitive advantage.

Software is often priced to gain a contract, and the functionality of the

system is then adjusted to meet the estimated price.

Plan-driven development is organized around a complete project plan

that defines the project activities, the planned effort, the activity schedule,

and who is responsible for each activity.

Project scheduling involves the creation of various graphical

representations of part of the project plan. Bar charts, which show the

activity duration and staffing timelines, are the most commonly used

schedule representations.

A project milestone is a predictable outcome of an activity or set of

activities. At each milestone, a formal report of progress should be

presented to management. A deliverable is a work product that is

delivered to the project customer.

The agile planning game involves the whole team in project planning.

The plan is developed incrementally, and, if problems arise, it is adjusted

so that software functionality is reduced instead of delaying the delivery

of an increment.

Estimation techniques for software may be experience-based, where

managers judge the effort required, or algorithmic, where the effort

required is computed from other estimated project parameters.

The COCOMO II costing model is a mature algorithmic cost model that

takes project, product, hardware, and personnel attributes into account

when formulating a cost estimate.

F u r t h E r r E A d i n g

Further reading suggested in Chapter 22 is also relevant to this chapter.

Ten Unmyths of Project Estimation.” A pragmatic article that discusses

the practical difficulties of project estimation and challenges some

fundamental assumptions in this area. (P. Armour, Comm.

ACM, 45(11), November 2002). http://

dx.doi.org/10.1145/581571.581582

Chapter 23 Exercises 697

Agile Estimating and Planning. This book is a comprehensive description of

story-based planning as used in XP, as well as a rationale for using an

agile approach to project planning. The book also includes a good, general

introduction to project planning issues. (M. Cohn, 2005, Prentice-Hall).

“Achievements and Challenges in COCOMO-based Software Resource

Estimation.” This article presents a history of the COCOMO models and

influences on these models, and discusses the variants of these models that

have been developed. It also identifies further possible developments in

the COCOMO approach. (B. W. Boehm and R. Valeridi, IEEE Software, 25

(5), September/October 2008).

http://dx.doi.org/10.1109/MS.2008.133

All About Agile; Agile Planning. This website on agile methods includes an

excellent set of articles

on agile planning from a number of different authors. (2007–2012).

http://www.allaboutagile.com/

category/agile-planning/

Project Management Knowhow: Project Planning. This website has a number

of useful articles on project management in general. These are aimed at

people who don’t have previous experience in

this area. (P. Stoemmer, 2009–2014). http://www.project-management-

knowhow.com/project_

planning.html

W E b S i t E

PowerPoint slides for this chapter:

www.pearsonglobaleditions.com/Sommerville

Links to supporting videos:

http://software-engineering-book.com/videos/software-management/

E x E r C i S E S

23.1. Describe the factors that affect software pricing. Define the “pricing

to win” approach in software pricing.

23.2. Explain why the process of project planning is iterative and why a

plan must be continually reviewed during a software project.

23.3. Define project scheduling. What are the things to be considered

while estimating schedules?

23.4. What is algorithmic cost modeling? What problems does it suffer

from when compared with other approaches to cost estimation?

23.5. Figure 23.14 sets out a number of tasks, their durations, and their

dependencies. Draw a bar chart showing the project schedule.

698 Chapter 23 Project planning

Task

Duration

Dependencies

T1

10

T2

15

T1

T3

10

T1, T2

T4

20

T5

10

T6

15

T3, T4

T7

20

T3

T8

35

T7

T9

15

T6

T10

5

T5, T9

T11

10

T9

T12

20

T10

T13

35

T3, T4

T14

10

T8, T9

T15

20

T12, T14

Figure 23.14

T16

10

T15

Scheduling example

23.6. Figure 23.14 shows the task durations for software project activities.

Assume that a serious, unanticipated setback occurs, and instead of taking

10 days, task T5 takes 40 days. Draw up new bar charts showing how the

project might be reorganized.

23.7. The planning game is based on the notion of planning to implement

the stories that represent the system requirements. Explain the potential

problems with this approach when software has high performance or

dependability requirements.

23.8. A software manager is in charge of the development of a safety-

critical software system, which is designed to control a radiotherapy

machine to treat patients suffering from cancer.

This system is embedded in the machine and must run on a special-

purpose processor with a fixed amount of memory (256 Mbytes). The

machine communicates with a patient database system to obtain the

details of the patient and, after treatment, automatically records the

radiation dose delivered and other treatment details in the database.

The COCOMO method is used to estimate the effort required to develop

this system, and an estimate of 26 person-months is computed. All cost

driver multipliers were set to 1 when making this estimate.

Chapter 23 References 699

Explain why this estimate should be adjusted to take project, personnel,

product, and organizational factors into account. Suggest four factors that

might have significant effects on the initial COCOMO estimate and

propose possible values for these factors. Justify why you have included

each factor.

23.9. Some very large software projects involve writing millions of lines of

code. Explain why the effort estimation models, such as COCOMO, might

not work well when applied to very large systems.

23.10. Is it ethical for a company to quote a low price for a software

contract knowing that the requirements are ambiguous and that they can

charge a high price for subsequent changes requested by the customer?

r E F E r E n C E S

Abts, C., B. Clark, S. Devnani-Chulani, and B. W. Boehm. 2000. “COCOMO

II Model Definition Manual.”

Center for Software Engineering, University of Southern California. http://

csse.usc.edu/csse/

research/COCOMOII/cocomo2000.0/CII_modelman2000.0.pdf

Beck, K., and C. Andres. 2004. Extreme Programming Explained: 2nd ed.

Boston: Addison-Wesley.

Boehm, B., B. Clark, E. Horowitz, C. Westland, R. Madachy, and R. Selby.

1995. “Cost Models for Future Software Life Cycle Processes: COCOMO 2.”

Annals of Software Engineering: 1–31.

doi:10.1007/BF02249046.

Boehm, B., and W. Royce. 1989. “Ada COCOMO and the Ada Process

Model.” In Proc. 5th COCOMO

Users’ Group Meeting. Pittsburgh: Software Engineering Institute. http://

www.dtic.mil/dtic/tr/

fulltext/u2/a243476.pdf

Boehm, B. W. 1981. Software Engineering Economics. Englewood Cliffs, NJ:

Prentice-Hall.

Boehm, B. W., C. Abts, A. W. Brown, S. Chulani, B K. Clark, E. Horowitz,

R. Madachy, D. Reifer, and B. Steece. 2000. Software Cost Estimation with

COCOMO II. Englewood Cliffs, NJ: Prentice-Hall.

Cohn, M. 2005. Agile Estimating and Planning. Englewood-Cliffs, NJ:

Prentice Hall.

QSM. 2014. “Function Point Languages Table.” http://www.qsm.com/

resources/function-point-

languages-table

Rubin, K. S. 2013. Essential Scrum. Boston: Addison-Wesley.

24

Quality management

Objectives

The objectives of this chapter are to introduce software quality

management and software measurement. When you have read the

chapter, you will:

have been introduced to the quality management process and

know why quality planning is important;

be aware of the importance of standards in the quality management

process and know how standards are used in quality assurance;

understand how reviews and inspections are used as a mechanism

for software quality assurance;

understand how quality management in agile methods is based on

the development of a team quality culture;

understand how measurement may be helpful in assessing some

software quality attributes, the notion of software analytics, and

the limitations of software measurement.

Contents

24.1 Software quality

24.2 Software standards

24.3 Reviews and inspections

24.4 Quality management and agile development

24.5 Software measurement

Chapter 24 Quality management 701

Software quality management is concerned with ensuring that developed

software

systems are “fit for purpose.” That is, systems should meet the needs of

their users, should perform efficiently and reliably, and should be

delivered on time and within

budget. The use of quality management techniques along with new

software tech-

nologies and testing methods has led to significant improvements in the

level of

software quality over the past 20 years.

Formalized quality management (QM) is particularly important in teams

that are

developing large, long-lifetime systems that take several years to develop.

These systems are developed for external clients, usually using a plan-

based process. For these systems, quality management is both an

organizational and an individual project issue:

1. At an organizational level, quality management is concerned with

establishing a

framework of organizational processes and standards that will lead to

high-quality

software. The QM team should take responsibility for defining the

software

development processes to be used and standards that should apply to the

software

and related documentation, including the system requirements, design,

and code.

2. At a project level, quality management involves the application of

specific quality processes, checking that these planned processes have

been followed, and

ensuring that the project outputs meet the defined project standards.

Project

quality management may also involve defining a quality plan for a project.

The

quality plan should set out the quality goals for the project and define

what

processes and standards are to be used.

Software quality management techniques have their roots in methods and

techniques

that were developed in manufacturing industries, where the terms quality

assurance and quality control are widely used. Quality assurance is the

definition of processes and standards that should lead to high-quality

products and the introduction of quality processes into the manufacturing

process. Quality control is the application of these quality processes to

weed out products that are not of the required level of quality. Both

quality assurance and quality control are part of quality management.

In the software industry, some companies see quality assurance as the

definition

of procedures, processes, and standards to ensure that software quality is

achieved.

In other companies, quality assurance also includes all configuration

management,

verification, and validation activities that are applied after a product has

been handed over by a development team.

Quality management provides an independent check on the software

develop-

ment process. The QM team checks the project deliverables to ensure that

they are

consistent with organizational standards and goals (Figure 24.1). They

also check

process documentation, which records the tasks that have been completed

by each

team working on this project. The QM team uses documentation to check

that impor-

tant tasks have not been forgotten or that one group has not made

incorrect assump-

tions about what other groups have done.

The QM team in large companies is usually responsible for managing the

release

testing process. As I discussed in Chapter 8, this means that they manage

the testing of the software before it is released to customers. In addition,

they are responsible

702 Chapter 24 Quality management

Software development

process

D1

D2

D3

D4

D5

Quality management

process

Standards and

Quality

Quality review reports

procedures

plan

Figure 24.1 Quality

management and

for checking that the system tests provide coverage of the requirements

and that

software development proper records of the testing process are

maintained.

The QM team should be independent and not part of the software

development

group so that they can take an objective view of the quality of the

software. They can report on software quality without being influenced by

software development issues.

Ideally, the QM team should have organization-wide responsibility for

quality man-

agement. They should report to management above the project manager

level.

Because project managers have to maintain the project budget and

schedule, they

may be tempted to compromise on product quality to meet that schedule.

An independ-

ent QM team ensures that the organizational goals of quality are not

influenced by

short-term budget and schedule considerations. In smaller companies,

however, this is practically impossible. Quality management and software

development are inevitably

intertwined with people having both development and quality

responsibilities.

Formalized quality planning is an integral part of plan-based development

processes.

It is the process of developing a quality plan for a project. The quality plan

should set out the desired software qualities and describe how these

qualities are to be assessed. It defines what “high-quality” software

actually means for a particular system. Engineers, therefore, have a shared

understanding of the most important software quality attributes.

Humphrey (Humphrey 1989), in his classic book on software

management, sug-

gests an outline structure for a quality plan. This outline includes the

following:

1. Product introduction A description of the product, its intended market,

and the quality expectations for the product.

2. Product plans The critical release dates and responsibilities for the

product, along with plans for distribution and product servicing.

3. Process descriptions The development and service processes and

standards that should be used for product development and management.

4. Quality goals The quality goals and plans for the product, including an

identification and justification of critical product quality attributes.

5. Risks and risk management The key risks that might affect product

quality and the actions to be taken to address these risks.

24.1 Software quality 703

Quality plans, which are developed as part of the general project planning

process,

differ in detail depending on the size and type of system being developed.

However, when writing quality plans, you should try to keep them as

short as possible. If the document is too long, people will not read it, so

defeating the purpose of producing the quality plan.

Traditional quality management is a formal process that relies on

maintaining exten-

sive documentation about testing and system validation and on how

processes have

been followed. In this respect, it is diametrically opposed to agile

development, where the aim is to spend as little time as possible in writing

documents and formalizing how the development work should be done.

QM techniques have therefore had to evolve

when agile methods are used. I discuss QM and agile development in

Section 24.4.

24.1 Software quality

The manufacturing industry established the fundamentals of quality

management in

a drive to improve the quality of the products that were being made. As

part of this effort, the industry developed a definition of quality that was

based on conformance with a detailed product specification. The

underlying assumption was that products

could be completely specified and procedures could be established that

could check

a manufactured product against its specification. Of course, products will

never

exactly meet a specification, so some tolerance was allowed. If the product

was

“almost right,” it was classed as acceptable.

Software quality is not directly comparable with quality in manufacturing.

The

idea of tolerances is applicable in analog systems but does not apply to

software.

Furthermore, it is often impossible to come to an objective conclusion

about whether or not a software system meets its specification:

1. It is difficult to write complete and unambiguous software

requirements.

Software developers and customers may interpret the requirements in

different

ways, and it may be impossible to reach agreement on whether or not

software

conforms to its specification.

2. Specifications usually integrate requirements from several classes of

stake-

holder. These requirements are inevitably a compromise and may not

include

the requirements of all stakeholder groups. The excluded stakeholders may

therefore perceive the system as a poor-quality system, even though it

imple-

ments the agreed requirements.

3. It is impossible to measure certain quality characteristics (e.g.,

maintainability) directly, and so they cannot be specified in an

unambiguous way. I discuss the

difficulties of measurement in Section 24.4.

Because of these problems, the assessment of software quality is a

subjective

process. The quality management team uses their judgment to decide if an

acceptable

level of quality has been achieved. They decide whether or not the

software is fit for

704 Chapter 24 Quality management

Safety

Understandability

Portability

Security

Testability

Usability

Reliability

Adaptability

Reusability

Resilience

Modularity

Efficiency

Figure 24.2 Software

Robustness

Complexity

Learnability

quality attributes

its intended purpose. This decision involves answering questions about the

system’s

characteristics. For example:

1. Has the software been properly tested, and has it been shown that all

require-

ments have been implemented?

2. Is the software sufficiently dependable to be put into use?

3. Is the performance of the software acceptable for normal use?

4. Is the software usable?

5. Is the software well structured and understandable?

6. Have programming and documentation standards been followed in the

develop-

ment process?

There is a general assumption in software quality management that the

system

will be tested against its requirements. The judgment on whether or not it

delivers

the required functionality should be based on the results of these tests.

Therefore, the QM team should review the tests that have been developed

and examine the test

records to check that testing has been properly carried out. In some

companies, the

QM group carries out final system testing; in others, a dedicated system

testing team reports to the system quality manager.

The subjective quality of a software system is largely based on its non-

functional

characteristics. This reflects practical user experience—if the software’s

functionality is not what is expected, then users will often just work

around this deficiency and find other ways to do what they want to do.

However, if the software is unreliable or too slow, then it is practically

impossible for them to achieve their goals.

Therefore, software quality is not just about whether the software

functionality

has been correctly implemented, but also depends on non-functional

system attrib-

utes as shown in Figure 24.2. These attributes reflect the software

dependability,

usability, efficiency, and maintainability.

It is not possible for any system to be optimized for all of these attributes.

For

example, improving security may lead to loss of performance. The quality

plan

should therefore define the most important quality attributes for the

software that is being developed. It may be that efficiency is critical and

other factors have to be

sacrificed to achieve it. If you have emphasized the importance of

efficienty in the quality plan, the engineers working on the development

can work together to achieve

this. The plan should also include a definition of the quality assessment

process.

24.1 Software quality 705

Develop

Assess product

Define process

product

quality

Improve

No

Quality

Yes

Standardize

process

OK

process

Figure 24.3 Process-

This process should be an agreed way of assessing whether some quality,

such as

based quality

maintainability or robustness, is present in the product.

Traditional software quality management is based on the assumption that

the qual-

ity of software is directly related to the quality of the software

development process.

This assumption comes from manufacturing systems where product quality

is inti-

mately related to the production process. A manufacturing process

involves configur-

ing, setting up, and operating the machines involved in the process. Once

the machines are operating correctly, product quality naturally follows.

You measure the quality of the product and change the process until you

achieve the quality level that you need.

Figure 24.3 illustrates this process-based approach to achieving product

quality.

There is a clear link between process and product quality in

manufacturing because

the process is relatively easy to standardize and monitor. Once

manufacturing sys-

tems are calibrated, they can be run again and again to output high-

quality products.

However, software is designed rather than manufactured, and the

relationship

between process quality and product quality is more complex. Software

design is a

creative process, so the influence of individual skills and experience is

significant.

External factors, such as the novelty of an application or commercial

pressure for an early product release, also affect product quality

irrespective of the process used.

Without doubt, the development process used has a significant influence

on the qual-

ity of the software, and good processes are more likely to lead to good

quality software.

Process quality management and improvement can result in fewer defects

in the software being developed. However, it is difficult to assess software

quality attributes, such as reliability and maintainability, without using

the software for a long period. Consequently, it is hard to tell how process

characteristics influence these attributes. Furthermore, because of the role

of design and creativity in the software process, process standardization

can sometimes stifle creativity, which may lead to poorer rather than

better quality software.

Defined processes are important, but quality managers should also aim to

develop

a “quality culture” in which everyone responsible for software

development is com-

mitted to achieving a high level of product quality. They should encourage

teams to

take responsibility for the quality of their work and to develop new

approaches to

quality improvement. While standards and procedures are the basis of

quality man-

agement, good-quality managers recognize that there are intangible

aspects to software quality (elegance, readability, etc.) that cannot be

embodied in standards. They

should support people who are interested in the intangible aspects of

quality and

encourage professional behavior in all team members.

706 Chapter 24 Quality management

Documentation standards

Project documents are a tangible way of describing the different

representations of a software system (requirements, UML, code, etc.) and

its production process. Documentation standards define the organization

of different types of document as well as the document format. They are

important because they make it easier to check that important material

has not been omitted from documents and ensure that project documents

have a common “look and feel.” Standards may be developed for the

process of writing documents, for the documents themselves and for

document exchange.

http://software-engineering-book.com/web/documentation-standards/

24.2 Software standards

Software standards play an important role in plan-based software quality

management.

As I have discussed, an important part of quality assurance is the

definition or selection of standards that should apply to the software

development process or software product.

As part of this process, tools and methods to support the use of these

standards may also be chosen. Once standards have been selected for use,

project-specific processes have to be defined to monitor the use of the

standards and check that they have been followed.

Software standards are important for three reasons:

1. Standards capture wisdom that is of value to the organization. They are

based on knowledge about the best or most appropriate practice for the

company. This

knowledge is often acquired only after a great deal of trial and error.

Building it into a standard helps the company reuse this experience and

avoid previous mistakes.

2. Standards provide a framework for defining what quality means in a

particular

setting. As I have discussed, software quality is subjective, and by using

stand-

ards you establish a basis for deciding if a required level of quality has

been

achieved. Of course, this depends on setting standards that reflect user

expecta-

tions for software dependability, usability, and performance.

3. Standards assist continuity when work carried out by one person is

taken up and

continued by another. Standards ensure that all engineers within an

organization

adopt the same practices. Consequently, the learning effort required when

start-

ing new work is reduced.

Two related types of software engineering standard may be defined and

used in

software quality management:

1. Product standards These apply to the software product being developed.

They include document standards, such as the structure of requirements

documents,

documentation standards, such as a standard comment header for an

object class

definition, and coding standards, which define how a programming

language

should be used.

24.2 Software standards 707

Product standards

Process standards

Design review form

Design review conduct

Requirements document structure

Submission of new code for system building

Method header format

Version release process

Java programming style

Project plan approval process

Project plan format

Change control process

Figure 24.4 Product

and process

Change request form

Test recording process

standards

2. Process standards These define the processes that should be followed

during software development. They should encapsulate good development

practice.

Process standards may include definitions of specification, design, and

valida-

tion processes, process support tools, and a description of the documents

that

should be written during these processes.

Examples of product and process standards that may be used are shown in

Figure 24.4.

Standards have to deliver value, in the form of increased product quality.

There is no point in defining standards that are expensive in terms of time

and effort to apply that only lead to marginal improvements in quality.

Product standards have to be designed so that they can be applied and

checked in a cost-effective way, and process standards should include the

definition of processes that check if product standards have been followed.

The software engineering standards that are used within a company are

usually

adapted from broader national or international standards. National and

international standards have been developed covering software

engineering terminology, programming languages such as Java and C++,

notations such as charting symbols,

procedures for deriving and writing software requirements, quality

assurance proce-

dures, and software verification and validation processes (IEEE 2003).

More spe-

cialized standards have been developed for safety and security critical

systems.

Software engineers sometimes consider standards to be overprescriptive

and

irrelevant to the technical activity of software development. This is

particularly

likely when project standards require tedious documentation and work

recording.

Although they usually agree about the general need for standards,

engineers often

find good reasons why standards are not necessarily appropriate to their

particular

project. Quality managers who set the standards should therefore consider

possible

actions to convince engineers of the value of standards:

1. Involve software engineers in the selection of product standards If

developers understand why standards have been selected, they are more

likely to be committed to these standards. Ideally, the standards document

should not just set out

the standard to be followed but should also include commentary

explaining why

standardization decisions have been made.

708 Chapter 24 Quality management

2. Review and modify standards regularly to reflect changing technologies

Standards are expensive to develop, and they tend to be enshrined in a

company

standards handbook. Because of the costs and discussion required, there is

often

a reluctance to change them. A standards handbook is essential, but it

should

evolve to reflect changing circumstances and technology.

3. Make sure that tool support is available to support standards-based

development Developers often find standards to be a bugbear when

conformance to them

involves tedious manual work that could be done by a software tool. If

tool support

is available, standards can be followed without much extra effort. For

example,

program layout standards can be defined and implemented by a syntax-

directed

program editing system.

Different types of software need different development processes, so

standards

have to be adaptable. There is no point in prescribing a particular way of

working if it is inappropriate for a project or project team. Each project

manager should have

the authority to modify process standards according to individual

circumstances.

However, when changes are made, it is important to ensure that these

changes do not

lead to a loss of product quality.

The project manager and the quality manager can avoid the problems of

inap-

propriate standards by careful quality planning early in the project. They

should

decide which of the organizational standards should be used without

change, which

should be modified, and which should be ignored. New standards may

have to be

created in response to customer or project requirements. For example,

standards

for formal specifications may be required if these standards have not been

used in

previous projects.

24.2.1 The ISO 9001 standards framework

The international set of standards used in the development of quality

manage-

ment systems in all industries is called ISO 9000. ISO 9000 standards can

be

applied to a range of organizations from manufacturing through to service

indus-

tries. ISO 9001, the most general of these standards, applies to

organizations that

design, develop, and maintain products, including software. The ISO 9001

standard was originally developed in 1987. I explain the 2008 version of

the

standard here, but the standard may change in 2015 when a new version

is sched-

uled for release.

The ISO 9001 standard is not a standard for software development but

rather is a

framework for developing software standards. It sets out general quality

principles, describes quality processes in general, and lays out the

organizational standards and procedures that should be defined. These

should be documented in an organizational

quality manual.

A major revision of the ISO 9001 standard in 2000 reoriented the standard

around

nine core processes (Figure 24.5). If an organization is to be ISO 9001

conformant,

it must document how its processes relate to these core processes. It must

also define and maintain records demonstrating that the defined

organizational processes have

24.2 Software standards 709

Business

Design and

Business

Supplier

acquisition

development

management

management

Product

Supporting

delivery processes

processes

Production and

Configuration

Inventory

Test

delivery

management

management

Service and

support

Figure 24.5 ISO 9001

core processes

been followed. The company quality manual should describe the relevant

processes

and the process data that has to be collected and maintained.

The ISO 9001 standard does not define or prescribe the specific quality

processes

that a company should use. To be conformant with ISO 9001, a company

must define

the types of process shown in Figure 24.5 and have procedures in place

demonstrat-

ing that its quality processes are being followed. This allows flexibility

across industrial sectors and company sizes.

Quality standards can be defined that are appropriate for the type of

software

being developed. Small companies can have simple processes without

much docu-

mentation and still be ISO 9001 compliant. However, this flexibility means

that you

cannot make assumptions about the similarities or differences between the

processes

in different ISO 9001–compliant companies. Some companies may have

very rigid

quality processes that keep detailed records while others may be much less

formal,

with minimal additional documentation.

The relationships between ISO 9001, organizational quality manuals, and

indi-

vidual project quality plans are shown in Figure 24.6. This diagram has

been adapted from a model given by Ince (Ince 1994), who explains how

the general ISO 9001

standard can be used as a basis for software quality management

processes. Bamford

and Deibler (Bamford and Deibler 2003) explain how the later ISO 9001:

2000

standard can be applied in software companies.

Some software customers demand that their suppliers be ISO 9001

certified. The

customers can then be confident that the software development company

has an

approved quality management system in place. Independent accreditation

authorities

examine the quality management processes and process documentation

and decide if

these processes cover all of the areas specified in ISO 9001. If so, they

certify that a company’s quality processes, as defined in the quality

manual, conform to the ISO

9001 standard.

Some people mistakenly think that ISO 9001 certification means that the

quality

of the software produced by certified companies will always be better than

that from

710 Chapter 24 Quality management

ISO 9001

quality models

instantiated as

Organization

documents

Organization

quality manual

quality process

is used to develop

instantiated as

Project 1

Project 2

Project 3

Project quality

quality plan

quality plan

quality plan

management

Figure 24.6 ISO 9001

and quality

Supports

management

uncertified companies. The ISO 9001 standard focuses on ensuring that

the organi-

zation has quality management procedures in place and that it follows

these proce-

dures. There is no guarantee that ISO 9001 certified companies use the

best software development practices or that their processes lead to high-

quality software.

The ISO 9001 certification is inadequate, in my view, because it defines

quality

to be the conformance to standards. It takes no account of quality as

experienced by users of the software. For example, a company could

define test coverage standards

specifying that all methods in objects must be called at least once.

Unfortunately,

this standard can be met by incomplete software testing that does not

include tests

with different method parameters. As long as the defined testing

procedures are fol-

lowed and test records are maintained, the company could be ISO 9001

certified.

24.3 Reviews and inspections

Reviews and inspections are quality assurance activities that check the

quality of

project deliverables. This involves checking the software, its

documentation, and

records of the process to discover errors and omissions as well as

standards viola-

tions. As I explained in Chapter 8, reviews and inspections are used

alongside pro-

gram testing as part of the general process of software verification and

validation.

During a review, several people examine the software and its associated

docu-

mentation, looking for potential problems and nonconformance with

standards. The

review team makes informed judgments about the level of quality of the

software or

project documents. Project managers may then use these assessments to

make plan-

ning decisions and allocate resources to the development process.

Quality reviews are based on documents that have been produced during

the soft-

ware development process. As well as software specifications, designs,

code, pro-

cess models, test plans, configuration management procedures, process

standards,

24.3 Reviews and inspections 711

Error

Planning

correction

Individual

Review

preparation

Follow-up

meeting

checks

Group

Improvement

preparation

Pre-review activities

Post-review activities

Figure 24.7 The

and user manuals may all be reviewed. The review should check the

consistency and

software review

process

completeness of the documents or code under review and, if standards

have been

defined, make sure that these quality standards have been followed.

Reviews are not just about checking conformance to standards. They are

also

used to help discover problems and omissions in the software or project

documenta-

tion. The conclusions of the review should be formally recorded as part of

the qual-

ity management process. If problems have been discovered, the reviewers’

comments

should be passed to the author of the software or whoever is responsible

for correcting errors or omissions.

The purpose of reviews and inspections is to improve software quality, not

to assess the performance of people in the development team. Reviewing

is a public process of

error detection, compared with the more private component-testing

process. Inevitably, mistakes that are made by individuals are revealed to

the whole programming team. To ensure that all developers engage

constructively with the review process, project managers have to be

sensitive to individual concerns. They must develop a working culture that

provides support without blame when errors are discovered.

Quality reviews are not management progress reviews, although

information about

the software quality may be used in making management decisions.

Progress reviews

compare the actual progress in a software project against the planned

progress. Their prime concern is whether or not the project will deliver

useful software on time and on budget. Progress reviews take external

factors into account, and changed circumstances may mean that software

under development is no longer required or has to be

radically changed. Projects that have developed high-quality software may

have to be canceled because of changes to the business or its operating

environment.

24.3.1 The review process

Although there are many variations in the details of reviews, review

processes

(Figure 24.7) are structured into three phases:

1. Pre-review activities These are preparatory activities that are essential for

the review to be effective. Typically, pre-review activities are concerned

with

review planning and review preparation. Review planning involves setting

up

a review team, arranging a time and place for the review, and distributing

the documents to be reviewed. During review preparation, the team may

meet to

get an overview of the software to be reviewed. Individual review team

mem-

bers read and understand the software or documents and relevant

standards.

712 Chapter 24 Quality management

Roles in the inspection process

When program inspection was established at IBM (Fagan, 1986), a number

of formal roles were defined for members of the inspection team. These

included moderator, code reader, and scribe. Other users of inspections

have modified these roles, but it is generally accepted that an inspection

should involve the code author, an inspector, and a scribe and should be

chaired by a moderator.

http://software-engineering-book.com/web/qm-roles

They work independently to find errors, omissions, and departures from

stand-

ards. Reviewers may supply written comments on the software if they

cannot

attend the review meeting.

2. The review meeting During the review meeting, an author of the

document or program being reviewed should “walk through” the

document with the review

team. The review itself should be relatively short—two hours at most. One

team

member should chair the review, and another should formally record all

review

decisions and actions to be taken. During the review, the chair is

responsible for

ensuring that all submitted comments are considered. The review chair

should

sign a record of comments and actions agreed during the review.

3. Post-review activities After a review meeting has ended, the issues and

problems raised during the review must be addressed. Actions may involve

fixing

software bugs, refactoring software so that it conforms to quality

standards, or

rewriting documents. Sometimes the problems discovered in a quality

review

are such that a management review is also necessary to decide if more

resources

should be made available to correct them. After changes have been made,

the

review chair may check that all the review comments have been taken

into

account. Sometimes a further review will be required to check that the

changes

made cover all of the previous review comments.

Review teams should normally have a core of three to four people who are

selected

as principal reviewers. One member should be an experienced designer

who will take

the responsibility for making significant technical decisions. The principal

reviewers may invite other project members, such as the designers of

related subsystems, to

contribute to the review. They may not be involved in reviewing the

whole document

but should concentrate on those sections that affect their work.

Alternatively, the

review team may circulate the document and ask for written comments

from a broad

spectrum of project members. The project manager need not be involved

in the

review, unless problems are anticipated that require changes to the project

plan.

The processes suggested for reviews assume that the review team has a

face-to-

face meeting to discuss the software or documents they are reviewing.

However,

project teams are now often distributed, sometimes across countries or

continents, so it is impractical for team members to meet face to face.

Remote reviewing can be

supported using shared documents where each review team member can

annotate

the document with their comments. Face-to-face meetings may be

impossible

24.3 Reviews and inspections 713

because of work schedules or the fact that people work in different time

zones. The

review chair is responsible for coordinating comments and for discussing

changes

individually with the review team members.

24.3.2 Program inspections

Program inspections are peer reviews where team members collaborate to

find bugs

in the program that is being developed. As I discussed in Chapter 8,

inspections may be part of the software verification and validation

processes. They complement testing as they do not require the program to

be executed. Incomplete versions of the

system can be verified, and representations such as UML models can be

checked.

Program tests may be reviewed. Test reviews often find problems with

tests and so

improve their effectiveness in detecting program bugs.

Program inspections involve team members from different backgrounds

who

make a careful, line-by-line review of the program source code. They look

for

defects and problems and describe them at an inspection meeting. Defects

may be

logical errors, anomalies in the code that might indicate an erroneous

condition or

features that have been omitted from the code. The review team examines

the

design models or the program code in detail and highlights anomalies and

problems

for repair.

During an inspection, a checklist of common programming errors is often

used to

focus the search for bugs. This checklist may be based on examples from

books or

from knowledge of defects that are common in a particular application

domain. You

use different checklists for different programming languages because each

language

has its own characteristic errors. Humphrey (Humphrey, 1989), in a

comprehensive

discussion of inspections, gives a number of examples of inspection

checklists.

Possible checks that might be made during the inspection process are

shown in

Figure 24.8. Organizations should develop their own inspection checklists

based on

local standards and practices. These checklists should be regularly

updated, as new

types of defects are found. The items in the checklist vary according to

programming language because of the different levels of checking that are

possible at compile-time. For example, a Java compiler checks that

functions have the correct number of

parameters; a C compiler does not.

Companies that use inspections have found that they are effective in

finding bugs. In early work, Fagan (Fagan 1986) reported that more than

60% of the errors in a program were detected using informal program

inspections. McConnell (McConnell 2004)

compares unit testing, where the defect detection rate is about 25%, with

inspections, where the defect detection rate was 60%. These comparisons

were made before widespread automated testing. We don’t know how

inspections compare to this approach.

In spite of their well-publicized cost-effectiveness, many software

development com-

panies are reluctant to use inspections or peer reviews. Software engineers

with experience in program testing are sometimes unwilling to accept the

fact that inspections can be more effective for defect detection than

testing. Managers may be suspicious because inspections require

additional costs during design and development. They may not want to

take the risk that there will be no corresponding savings in program

testing costs.

714 Chapter 24 Quality management

Fault class

Inspection check

Data faults

Are all program variables initialized before their values are used?

Have all constants been named?

Should the upper bound of arrays be equal to the size of the array or

Size 21?

If character strings are used, is a delimiter explicitly assigned?

Is there any possibility of buffer overflow?

Control faults

For each conditional statement, is the condition correct?

Is each loop certain to terminate?

Are compound statements correctly bracketed?

In case statements, are all possible cases accounted for?

If a break is required after each case in case statements, has it been

included?

Input/output faults

Are all input variables used?

Are all output variables assigned a value before they are output?

Can unexpected inputs cause corruption?

Interface faults

Do all function and method calls have the correct number of

parameters?

Do formal and actual parameter types match?

Are the parameters in the right order?

If components access shared memory, do they have the same model

of

the shared memory structure?

Storage management faults

If a linked structure is modified, have all links been correctly

reassigned?

If dynamic storage is used, has space been allocated correctly?

Is space explicitly de-allocated after it is no longer required?

Exception management faults

Have all possible error conditions been taken into account?

Figure 24.8 An

inspection checklist

24.4 Quality management and agile development

Agile methods of software engineering focus on the development of code.

They

minimize documentation and processes that are not directly concerned

with code

development and emphasize the importance of informal communications

among

team members rather than communications based on project documents.

Quality, in

agile development, means code quality and practices such as refactoring,

and test-

driven development are used to ensure that high-quality code is produced.

Quality management in agile development is informal rather than

document-based. It

relies on establishing a quality culture, where all team members feel

responsible for software quality and take actions to ensure that quality is

maintained. The agile community is fundamentally opposed to what it

sees as the bureaucratic overhead of standards-based approaches and

quality processes as embodied in ISO 9001. Companies that use agile

development methods are rarely concerned with ISO 9001 certification.

In agile development, quality management is based on shared good

practice rather

than formal documentation. Some examples of this good practice are:

1. Check before check-in Programmers are responsible for organizing their

own code reviews with other team members before the code is checked in

to the build system.

24.4 Quality management and agile development 715

2. Never break the build It is not acceptable for team members to check in

code that causes the system as a whole to fail. Therefore, individuals have

to test their code changes against the whole system and be confident that

these codes work

as expected. If the build is broken, the person responsible is expected to

give top

priority to fixing the problem.

3. Fix problems when you see them The code of the system belongs to the

team rather than to individuals. Therefore, if a programmer discovers

problems or

obscurities in code developed by someone else, he or she can fix these

problems

directly rather than referring them back to the original developer.

Agile processes rarely use formal inspection or review processes. In Scrum,

the

development team meets after each iteration to discuss quality issues and

prob-

lems. The team may decide on changes to the way they work to avoid any

quality

problems that have emerged. A collective decision may be made to focus

on refac-

toring and quality improvement during a sprint rather than the addition of

new

system functionality.

Code reviews may be the responsibility of individuals (check before check-

in) or

may rely on the use of pair programming. As I discussed in Chapter 3, pair

program-

ming is an approach in which two people are responsible for code

development and

work together to achieve it. Code developed by an individual is therefore

constantly being examined and reviewed by another team member. Two

people look at every

line of code and check it before it is accepted.

Pair programming leads to a deep knowledge of a program, as both

program-

mers have to understand the program in detail to continue development.

This depth

of knowledge is sometimes difficult to achieve in other inspection

processes, and

so pair programming can find bugs that sometimes would not be

discovered in

formal inspections. However, the two people involved cannot be as

objective as an

external inspection team inasmuch as they are examining their own work.

Potential

problems are:

1. Mutual misunderstandings Both members of a pair may make the same

mistake in understanding the system requirements. Discussions may

reinforce these errors.

2. Pair reputation Pairs may be reluctant to look for errors because they do

not want to slow down the progress of the project.

3. Working relationships The pair’s ability to discover defects is likely to be

compromised by their close working relationship that often leads to

reluctance to

criticize work partners.

The informal approach to quality management adopted in agile methods is

par-

ticularly effective for software product development where the company

develop-

ing the software also controls its specification. There is no need to deliver

quality reports to an external customer, nor is there any need to integrate

with other quality management teams. However, when a large system is

being developed for an

716 Chapter 24 Quality management

external customer, agile approaches to quality management with minimal

docu-

mentation may be impractical:

1. If the customer is a large company, it may have its own quality

management pro-

cesses and may expect the software development company to report on

progress in a

way that is compatible with these processes. Therefore, the development

team may

have to produce a formal quality plan and quality reports as required by

the customer.

2. Where several geographically distributed teams are involved in

development,

perhaps from different companies, then informal communications may be

impractical. Different companies may have different approaches to quality

man-

agement, and you may have to agree to produce some formal

documentation.

3. For long-lifetime systems, the team involved in development will

change over

time. If there is no documentation, new team members may find it

impossible to

understand why development decisions have been made.

Consequently, the informal approach to quality management in agile

methods

may have to be adapted so that some quality documentation and processes

are intro-

duced. Generally, this approach is integrated with the iterative

development process.

Instead of developing software, one of the sprints or iterations should

focus on producing essential software documentation.

24.5 Software measurement

Software measurement is concerned with quantifying some attribute of a

software

system such as its complexity or its reliability. By comparing the measured

values to each other and to the standards that apply across an

organization, you may be able to draw conclusions about the quality of

software or assess the effectiveness of software processes, tools, and

methods. In an ideal world, quality management could

rely on measurements of attributes that affect the software quality. You

could then

objectively assess process and tool changes that aim to improve software

quality.

For example, say you work in a company that plans to introduce a new

software-

testing tool. Before introducing the tool, you record the number of

software defects discovered in a given time. This is a baseline for assessing

the effectiveness of the tool. After using the tool for some time, you repeat

this process. If more defects have been found in the same amount of time,

after the tool has been introduced, then you

may decide that it provides useful support for the software validation

process.

The long-term goal of software measurement is to use measurement to

make

judgments about software quality. Ideally, a system could be assessed

using a range

of metrics to measure its attributes. From the measurements made, a value

for the

quality of the system could be inferred. If the software had reached a

required quality threshold, then it could be approved without review.

When appropriate, the measurement tools might also highlight areas of

the software that could be improved.

24.5 Software measurement 717

Software

Software

process

product

Control metric

Predictor metric

measurements

measurements

Figure 24.9 Predictor

Management

and control

decisions

measurements

However, we are still a long way from this ideal situation, and automated

quality

assessment is unlikely to become a reality in the near future.

A software metric is a characteristic of a software system, system

documentation,

or development process that can be objectively measured. Examples of

metrics

include the size of a product in lines of code, the Fog index, which is a

measure of the readability of narrative text, the number of reported faults

in a delivered software product, and the number of person-days required

to develop a system component.

Software metrics may be either control metrics or predictor metrics. As the

names

imply, control metrics support process management, and predictor metrics

help you

predict characteristics of the software. Control metrics are usually

associated with software processes. Examples of control or process metrics

are the average effort and the time required to repair reported defects.

Three kinds of process metrics can be used: 1. The time taken for a

particular process to be completed This can be the total time devoted to the

process, calendar time, the time spent on the process by

particular engineers, and so on.

2. The resources required for a particular process Resources might include

total effort in person-days, travel costs, or computer resources.

3. The number of occurrences of a particular event Examples of events that

might be monitored include the number of defects discovered during code

inspection,

the number of requirements changes requested, the number of bug reports

in a

delivered system, and the average number of lines of code modified in

response

to a requirements change.

Predictor metrics (sometimes called product metrics) are associated with

the soft-

ware itself. Examples of predictor metrics are the cyclomatic complexity of

a module, the average length of identifiers in a program, and the number

of attributes and operations associated with object classes in a design.

Both control and predictor metrics may influence management decision

making as shown in Figure 24.9. Managers use

process measurements to decide if process changes should be made and

predictor met-

rics to decide if software changes are necessary and if the software is

ready for release.

718 Chapter 24 Quality management

External quality attributes

Internal attributes

Depth of inheritance tree

Maintainability

Cyclomatic complexity

Reliability

Program size in lines

of code

Reusability

Number of error

messages

Usability

Figure 24.10

Relationships between

Length of user manual

internal and external

software attributes

In this chapter, I focus on predictor metrics, whose values are

automatically

assessed by analyzing code or documents. I discuss control metrics and

how they are

used in process improvement in web Chapter 26.

Measurements of a software system may be used in two ways:

1. To assign a value to system quality attributes By measuring the

characteristics of system components and then aggregating these

measurements, you may be

able to assess system quality attributes, such as maintainability.

2. To identify the system components whose quality is substandard

Measurements can identify individual components with characteristics

that deviate from the norm.

For example, you can measure components to discover those with the

highest com-

plexity. These components are most likely to contain bugs because the

complexity

makes it more likely that the component developer has made mistakes.

It is difficult to make direct measurements of many of the software quality

attrib-

utes shown in Figure 24.2. Quality attributes such as maintainability,

understanda-

bility, and usability are external attributes that relate to how developers

and users experience the software. They are affected by subjective factors,

such as user experience and education, and they cannot therefore be

measured objectively. To make a

judgment about these attributes, you have to measure some internal

attributes of the software (such as its size and complexity) and assume

that these are related to the

quality characteristics that you are concerned with.

Figure 24.10 shows some external software quality attributes and internal

attrib-

utes that could, intuitively, be related to them. The diagram suggests that

there may be relationships between external and internal attributes, but it

does not say how

these attributes are related. Kitchenham (Kitchenham 1990) suggested

that if the

measure of the internal attribute is to be a useful predictor of the external

software characteristic, three conditions must hold:

24.5 Software measurement 719

1. The internal attribute must be measured accurately. However,

measurement is

not always straightforward and may require specially developed tools.

2. A relationship must exist between the attribute that can be measured

and the external quality attribute that is of interest. That is, the value of

the quality attribute must be related, in some way, to the value of the

attribute than can be measured.

3. This relationship between the internal and external attributes must be

understood, validated, and expressed in terms of a formula or model.

Model formulation

involves identifying the functional form of the model (linear, exponential,

etc.)

by analysis of collected data, identifying the parameters that are to be

included

in the model and calibrating these parameters using existing data.

Recent work in the area of software analytics (Zhang et al. 2013) has used

data-

mining and machine-learning techniques to analyze repositories of

software product

and process data. The idea behind software analytics (Menzies and

Zimmermann

2013) is that we do not, in fact, need a model that reflects the

relationships between software quality and collected data. Rather, if there

is enough data, correlations can be discovered and predictions made about

software attributes. I discuss software

analytics in Section 24.5.4.

We have very little published information about systematic software

measure-

ment in industry. Many companies do collect information about their

software,

such as the number of requirements change requests or the number of

defects

discovered in testing. However, it is not clear if they then use these

measurements

systematically to compare software products and processes or assess the

impact

of changes to software processes and tools. There are several reasons why

this

is difficult:

1. It is impossible to quantify the return on investment of introducing an

organizational metrics or software analytics program. We have seen

significant improve-

ments in software quality over the past few years without the use of

metrics, so

it is difficult to justify the initial costs of introducing systematic software

measurement and assessment.

2. There are no standards for software metrics or standardized processes

for meas-

urement and analysis. Many companies are reluctant to introduce

measurement

programs until such standards and supporting tools are available.

3. Measurement may require the development and maintenance of

specialized

software tools. It is difficult to justify the costs of tool development when

the

returns from measurement are unknown.

4. In many companies, software processes are not standardized and are

poorly

defined and controlled. As such, there is too much process variability

within the

same company for measurements to be used in a meaningful way.

5. Much of the research on software measurement and metrics has focused

on

code-based metrics and plan-driven development processes. However,

more and

more software is now developed by reusing and configuring existing

application

720 Chapter 24 Quality management

systems, or by using agile methods. We don’t know how previous research

on

metrics applies to these software development techniques.

6. Introducing measurement adds overhead to processes. This contradicts

the aims

of agile methods, which recommend the elimination of process activities

that

are not directly related to program development. Companies that have

adopted

agile methods are therefore not likely to adopt a metrics program.

Software measurement and metrics are the basis of empirical software

engineer-

ing. In this research area, experiments on software systems and the

collection of data about real projects have been used to form and validate

hypotheses about software

engineering methods and techniques. Researchers working in this area

argue that we

can be confident of the value of software engineering methods and

techniques only

if we can provide concrete evidence that they actually provide the benefits

their

inventors suggest.

However, research on empirical software engineering has not had a

significant

impact on software engineering practice. It is difficult to relate generic

research to an individual project that differs from the research study.

Many local factors are likely to be more important than general empirical

results. For this reason, researchers in software analytics argue that

analysts should not try to draw general conclusions but should provide

analyses of the data for specific systems.

24.5.1 Product metrics

Product metrics are predictor metrics used to quantify internal attributes

of a software system. Examples of product metrics include the system size,

measured in lines

of code, or the number of methods associated with each object class.

Unfortunately,

as I have explained earlier in this section, software characteristics that can

be easily measured, such as size and cyclomatic complexity, do not have a

clear and consistent relationship with quality attributes such as

understandability and maintainability.

The relationships vary depending on the development processes and

technology

used and the type of system that is being developed.

Product metrics fall into two classes:

1. Dynamic metrics, which are collected by measurements made of a

program in execution. These metrics can be collected during system

testing or after the system has gone into use. An example might be the

number of bug reports or the

time taken to complete a computation.

2. Static metrics, which are collected by measurements made of

representations of the system, such as the design, program, or

documentation. Examples of static

metrics are shown in Figure 24.11.

These types of metrics are related to different quality attributes. Dynamic

metrics

help to assess the efficiency and reliability of a system. Static metrics help

assess the complexity, understandability, and maintainability of a system

or its components.

24.5 Software measurement 721

Software metric

Description

Fan-in/Fan-out

Fan-in is a measure of the number of functions or methods that call

another

function or method (say X). Fan-out is the number of functions that are

called by

function X. A high value for fan-in means that X is tightly coupled to the

rest of the design and changes to X will have extensive knock-on effects. A

high value for

fan-out suggests that the overall complexity of X may be high because of

the

complexity of the control logic needed to coordinate the called

components.

Length of code

This is a measure of the size of a program. Generally, the larger the size of

the code of a component, the more complex and error-prone that

component is likely to be.

Length of code has been shown to be one of the most reliable metrics for

predicting error-proneness in components.

Cyclomatic complexity

This is a measure of the control complexity of a program. This control

complexity may be related to program understandability. I discuss

cyclomatic complexity in Chapter 8.

Length of identifiers

This is a measure of the average length of identifiers (names for variables,

classes, methods, etc.) in a program. The longer the identifiers, the more

likely they are to be meaningful and hence the more understandable the

program.

Depth of conditional

This is a measure of the depth of nesting of if-statements in a program.

Deeply

nesting

nested if-statements are hard to understand and potentially error-prone.

Fog index

This is a measure of the average length of words and sentences in

documents.

The higher the value of a document’s Fog index, the more difficult the

document is

to understand.

Figure 24.11 Static

software product

A clear relationship usually exists between dynamic metrics and software

quality

metrics

characteristics. It is fairly easy to measure the execution time required for

particular functions and to assess the time required to start up a system.

These functions relate directly to the system’s efficiency. Similarly, the

number of system failures and the type of failure can be logged and

related directly to the reliability of the software.

I have explained how reliability can be measured in Chapter 12.

Static metrics, as shown in Figure 24.11, have an indirect relationship

with quality attributes. A large number of different metrics have been

proposed, and many experiments have tried to derive and validate the

relationships between these metrics and attributes, such as system

complexity and maintainability. None of these experiments have been

conclusive, but program size and control complexity appear be the most

reliable predictors of understandability, system complexity, and

maintainability.

The metrics in Figure 24.11 are applicable to any program, but more

specific

object-oriented metrics have also been proposed. Figure 24.12 summarizes

Chidamber and Kemerer’s suite (sometimes called the CK suite) of six

object-

oriented metrics (Chidamber and Kemerer 1994). Although these metrics

were orig-

inally proposed in the early 1990s, they are still the most widely used

object-oriented (OO) metrics. Some UML design tools automatically collect

values for these

metrics as UML diagrams are created.

El-Amam’s review of object-oriented metrics discussed the CK metrics and

other

OO metrics (El-Amam 2001). It concluded that there was insufficient

evidence to

understand how these and other object-oriented metrics relate to external

software

722 Chapter 24 Quality management

Object-oriented metric

Description

Weighted methods per

This is the number of methods in each class, weighted by the complexity

of

class (WMC)

each method. Therefore, a simple method may have a complexity of 1,

and a

large and complex method a much higher value. The larger the value for

this

metric, the more complex the object class. Complex objects are more likely

to

be difficult to understand. They may not be logically cohesive, so they

cannot

be reused effectively as superclasses in an inheritance tree.

Depth of inheritance

This represents the number of discrete levels in the inheritance tree where

tree (DIT)

subclasses inherit attributes and operations (methods) from superclasses.

The

deeper the inheritance tree, the more complex the design. Many object

classes may

have to be understood to understand the object classes at the leaves of the

tree.

Number of children (NOC)

This is a measure of the number of immediate subclasses in a class. It

measures the breadth of a class hierarchy, whereas DIT measures its

depth.

A high value for NOC may indicate greater reuse. It may mean that more

effort

should be made in validating base classes because of the number of

subclasses

that depend on them.

Coupling between object

Classes are coupled when methods in one class use methods or instance

classes (CBO)

variables defined in a different class. CBO is a measure of how much

coupling

exists. A high value for CBO means that classes are highly dependent.

Therefore,

it is more likely that changing one class will affect other classes in the

program.

Response for a class (RFC)

RFC is a measure of the number of methods that could potentially be

executed

in response to a message received by an object of that class. Again, RFC is

related to complexity. The higher the value for RFC, the more complex a

class,

and hence the more likely it is that it will include errors.

Lack of cohesion in

LCOM is calculated by considering pairs of methods in a class. LCOM is

the

methods (LCOM)

difference between the number of method pairs without shared attributes

and the

number of method pairs with shared attributes. The value of this metric

has been

widely debated, and it exists in several variations. It is not clear if it really

adds any additional, useful information over and above that provided by

other metrics.

Figure 24.12 The

CK object-oriented

qualities. This situation has not really changed since his analysis in 2001.

We still metrics suite

don’t know how to use measurements of object-oriented programs to draw

reliable

conclusions about their quality.

24.5.2 Software component analysis

A measurement process that may be part of a software quality assessment

process is

shown in Figure 24.13. Each system component can be analyzed

separately using a

range of metrics. The values of these metrics may then be compared for

different

components and, perhaps, with historical measurement data collected on

previous

projects. Anomalous measurements, which deviate significantly from the

norm, usu-

ally indicate problems with the quality of these components.

The key stages in this component measurement process are:

1. Choose measurements to be made The questions that the measurement is

intended to answer should be formulated and the measurements required

to

24.5 Software measurement 723

Choose

Analyze

measurements

anomalous

to be made

components

Select

Identify

components to

anomalous

be assessed

measurements

Measure

Figure 24.13 The

component

process of product

characteristics

measurement

answer these questions defined. Measurements that are not directly

relevant to

these questions need not be collected.

2. Select components to be assessed You may not need to assess metric

values for all of the components in a software system. Sometimes you can

select a representative selection of components for measurement, allowing

you to make an

overall assessment of system quality. At other times, you may wish to

focus on

the core components of the system that are in almost constant use. The

quality

of these components is more important than the quality of components

that are

infrequently executed.

3. Measure component characteristics The selected components are

measured,

and the associated metric values are computed. This step normally

involves pro-

cessing the component representation (design, code, etc.) using an

automated

data collection tool. This tool may be specially written or may be a feature

of

design tools that are already in use.

4. Identify anomalous measurements After the component measurements

have

been made, you then compare them with each other and to previous

measure-

ments that have been recorded in a measurement database. You should

look for

unusually high or low values for each metric, as these suggest that there

could

be problems with the component exhibiting these values.

5. Analyze anomalous components When you have identified components

that

have anomalous values for your chosen metrics, you should examine them

to

decide whether or not these anomalous metric values mean that the

quality of

the component is compromised. An anomalous metric value for

complexity

(say) does not necessarily mean a poor-quality component. There may be

some other reason for the high value, so there may not be any component

quality problems.

If possible, you should maintain all collected data as an organizational

resource and keep historical records of all projects even when data has not

been

used during a particular project. Once a sufficiently large measurement

database

has been established, you can then make comparisons of software quality

across

projects and validate the relations between internal component attributes

and

quality characteristics.

724 Chapter 24 Quality management

24.5.3 Measurement ambiguity

When you collect quantitative data about software and software processes,

you have

to analyze that data to understand its meaning. It is easy to misinterpret

data and to make incorrect inferences. You cannot simply look at the data

on its own. You must

also consider the context in which the data is collected.

To illustrate how collected data can be interpreted in different ways,

consider the

following scenario, which is concerned with the number of change

requests made by

a system’s users:

A manager decides to measure the number of change requests submitted by cus-

tomers based on an assumption that there is a relationship between these

change

requests and product usability and suitability. She assumes that the higher the

number of change requests, the less the software meets the needs of the

customer.

Handling change requests and changing the software are expensive. The organ-

ization therefore decides to modify its process with the aim of improving

customer satisfaction and, at the same time, reducing the costs of making

changes. The intent is that the process changes will result in better products and

fewer change requests. Processes are changed to increase customer involvement

in the software design process. Beta testing of all products is introduced, and

customer-requested modifications are incorporated in the delivered product.

After the process changes have been made, the measurement of change

requests continues. New versions of products, developed with the modified

process, are delivered. In some cases, the number of change requests is

reduced; in others, it is increased. The manager is baffled and finds it impos-

sible to understand the effects of the process changes on the product quality.

To understand why this kind of ambiguity can occur, you have to

understand why

users might make change requests:

1. The software is not good enough and does not do what customers want

it to do.

They therefore request changes to deliver the functionality they require.

2. Alternatively, the software may be very good, and so it is widely and

heavily

used. Change requests may be generated because many software users

crea-

tively think of new things that could be done with the software.

Increasing the customer’s involvement in the process may reduce the

number of

change requests for products where the customers were unhappy. The

process

changes have been effective and have made the software more usable and

suitable.

Alternatively, however, the process changes may not have worked, and

customers

may have decided to look for an alternative system. The number of change

requests

might decrease because the product has lost market share to a rival

product and there are consequently fewer product users.

24.5 Software measurement 725

On the other hand, the process changes might lead to many new, happy

customers

who wish to participate in the product development process. They

therefore generate

more change requests. Changes to the process of handling change requests

may con-

tribute to this increase. If the company is more responsive to customers,

they may

generate more change requests because they know that these requests will

be taken

seriously. They believe that their suggestions will probably be

incorporated in later versions of the software. Alternatively, the number of

change requests might have

increased because the beta-test sites were not typical of most usage of the

program.

To analyze the change request data, you do not simply need to know the

number

of change requests. You need to know who made the request, how the

software is

used, and why the request was made. You also need information about

external fac-

tors such as modifications to the change request procedure or market

changes that

might have an effect. With this information, you are in a better position to

find out if the process changes have been effective in increasing product

quality.

This illustrates the difficulties of understanding the effects of changes. The

“sci-

entific” approach to this problem is to reduce the number of factors that

might affect the measurements made. However, processes and products

that are being measured

are not insulated from their environment. The business environment is

constantly

changing, and it is impossible to avoid changes to work practice just

because they

may make comparisons of data invalid. As such, quantitative data about

human

activities cannot always be taken at face value. The reasons a measured

value

changes are often ambiguous. These reasons must be investigated in detail

before

any conclusions can be drawn from any measurements.

24.5.4 Software analytics

Over the past few years, the notion of “big data analysis” has emerged as a

means of discovering insights by automatically mining and analyzing very

large volumes of

automatically collected data. It is possible to discover relationships

between data items that could not be found by manual data analysis and

modeling. Software analytics is

the application of such techniques to data about software and software

processes.

Two factors have made software analytics possible:

1. The automated collection of user data by software product companies

when their

product is used. If the software fails, information about the failure and the

state of the system can be sent over the Internet from the user’s computer

to servers run by

the product developer. As a result, large volumes of data about individual

prod-

ucts such as Internet Explorer or Photoshop have become available for

analysis.

2. The use of open-source software available on platforms such as

Sourceforge

and GitHub and open-source repositories of software engineering data

(Menzies

and Zimmermann 2013). The source code of open-source software is

available

for automated analysis and can sometimes be linked with data in the

open-

source repository.

726 Chapter 24 Quality management

Menzies and Zimmerman (Menzies and Zimmermann 2013) define

software

analytics as:

Software analytics is analytics on software data for managers and software

engineers with the aim of empowering software development individuals and

teams to gain and share insight from their data to make better decisions.

Menzies and Zimmermann emphasize that the point of analytics is not to

derive gen-

eral theories about software but to identify specific issues that are of

interest to software developers and managers. Analytics aims to provide

information about these issues in real time so that actions can be taken in

response to the information provided by the analysis.

In a study of managers at Microsoft, Buse and Zimmermann (Buse and

Zimmermann

2012) identified information needs such as how to target testing,

inspections, and refactoring, when to release software, and how to

understand the needs of software customers.

A range of different data mining and analysis tools can be used for

software ana-

lytics (Witten, Frank, and Hall 2011). In general, it is impossible to know

what are the best analysis tools to use in a particular situation. You have

to experiment with several tools to discover which are most effective. Buse

and Zimmerman suggest a

number of guidelines for tool use:

Tools should be easy to use, as managers are unlikely to have

experience with analysis.

Tools should run quickly and produce concise outputs rather than large

volumes

of information.

Tools should make many measurements using as many parameters as

possible. It

is impossible to predict in advance what insights might emerge.

Tools should be interactive and allow managers and developers to

explore the

analyses. They should recognize that managers and developers are

interested in

different things. They should not be predictive but should support decision

mak-

ing based on the analysis of past and current data.

Zhang and her colleagues (Zhang et al. 2013) describe an excellent

practical

application of software analytics for performance debugging. User

software was

instrumented to collect data on response times and the associated system

state. When the response time was greater than expected, this data was

sent for analysis. The

automated analysis highlighted performance bottlenecks in the software.

The devel-

opment team could then improve the algorithms to eliminate the

bottleneck so that

performance was improved in a later software release.

At the time of writing, software analytics is immature, and it is too early

to say what effect it will have. Not only are there general problems of “big

data” processing (Harford 2013), but it will always be the case that our

knowledge depends on collected data from large companies. This data is

primarily from software products, and it is unclear if the tools and

techniques that are appropriate for products can also be used with custom

software. Small companies are unlikely to invest in the data collection

systems that are required for automated analysis and so they may not be

able to use software analytics.

24.5 Software

Chapter 24 measurement

Further Reading 727

K e y P o i n t s

Software quality management is concerned with ensuring that software

has a low number of defects and that it reaches the required standards of

maintainability, reliability, portability, and so forth. It includes defining

standards for processes and products and establishing processes to check

that these standards have been followed.

Software standards are important for quality assurance as they

represent an identification of best practice. When developing software,

standards provide a solid foundation for building good-quality software.

Reviews of the software process deliverables involve a team of people

who check that quality standards are being followed. Reviews are the

most widely used technique for assessing quality.

In a program inspection or peer review, a small team systematically

checks the code. They read the code in detail and look for possible errors

and omissions. The problems detected are then discussed at a code review

meeting.

Agile quality management does not usually rely on a separate quality

management team.

Instead, it relies on establishing a quality culture where the development

team works together to improve software quality.

Software measurement can be used to gather quantitative data about

software and the software process. You may be able to use the values of

the software metrics that are collected to make inferences about product

and process quality.

Product quality metrics are particularly useful for highlighting

anomalous components that may have quality problems. These

components should then be analyzed in more detail.

Software analytics is the automated analysis of large volumes of

software product and process data to discover relationships that may

provide insights for project managers and developers.

F u R t h e R R e a d i n g

Software Quality Assurance: From Theory to Implementation. An excellent,

still relevant, book on the principles and practice of software quality

assurance. It includes a discussion of standards such as ISO 9001. (D.

Galin, Addison-Wesley, 2004).

“Misleading Metrics and Unsound Analyses.” An excellent article by

leading metrics researchers that discusses the difficulties of understanding

what measurements really mean. (B. Kitchenham, R. Jeffrey and C.

Connaughton, IEEE Software, 24 (2), March–April 2007). http://

dx.doi.org/10.1109/MS.2007.49

“A Practical Guide to Implementing an Agile QA Process on Scrum

Projects.” This slide set presents an overview of how to integrate software

quality assurance with agile development using Scrum.

(S. Rayhan, 2008). https://www.scrumalliance.org/system/

resource_files/0000/0459/agileqa.pdf

“Software Analytics: So What?” This is a good introductory article that

explains what software analytics is and why it is increasingly important. It

is the introduction to a special issue on software analytics, and you may

find several other articles in that issue to be helpful in understanding

software analytics. (T. Menzies and T. Zimmermann, IEEE Software, 30 (4),

July–August 2013). http://dx.doi.org/10.1109/MS.2013.86

728

728 Chapter 24 Quality

Quality management

W e b S i t e

PowerPoint slides for this chapter:

www.pearsonglobaleditions.com/Sommerville

Links to supporting videos:

http://software-engineering-book.com/videos/software-management/

e x e R C i S e S

24.1. Define the terms quality assurance and quality control. List out the

key points included in Humphrey’s outline structure for software

management.

24.2. Explain how standards may be used to capture organizational

wisdom about effective methods of software development. Suggest four

types of knowledge that might be captured in organizational standards.

24.3. Discuss the assessment of software quality according to the quality

attributes shown in Figure 24.2. You should consider each attribute in

turn and explain how it might be assessed 24.4. Briefly describe possible

standards that might be used for:

the use of control constructs in C, C#, or Java;

reports that might be submitted for a term project in a university;

the process of making and approving program changes (web Chapter

26); and

the process of purchasing and installing a new computer.

24.5. Assume you work for an organization that develops database

products for individuals and small businesses. This organization is

interested in quantifying its software development.

Write a report suggesting appropriate metrics and suggest how these can

be collected.

24.6. Briefly explain what happens during the software quality review

process and the software quality inspection process.

24.7. What problems are likely to arise if formalized program inspections

are introduced in a company where some software is developed using

agile methods.

24.8. What is a software metric? Define different types of software metrics

with examples.

24.9. You work for a software product company and your manager has

read an article on software analytics. She asks you to do some research in

this area. Survey the literature on analytics and write a short report that

summarizes work in software analytics and issues to be considered if

analytics is introduced.

24.10 A colleague who is a very good programmer produces software with

a low number of defects but consistently ignores organizational quality

standards. How should her managers react to this behavior?

Chapter 24.5

24

Exercises

References 729

R e F e R e n C e S

Bamford, R., and W. J. Deibler. 2003. “ISO 9001:2000 for Software and

Systems Providers: An Engineering Approach.” Boca Raton, FL: CRC Press.

Buse, R. P. L., and T. Zimmermann. 2012. “Information Needs for

Software Development Analytics.”

In Int. Conf. on Software Engineering, 987–996. doi:10.1109/

ICSE.2012.6227122.

Chidamber, S., and C. Kemerer. 1994. “A Metrics Suite for Object-Oriented

Design.” IEEE Trans. on Software Eng. 20 (6): 476–493.

doi:10.1109/32.295895.

El-Amam, K. 2001. “Object-Oriented Metrics: A Review of Theory and

Practice.” National Research

Council of Canada. http://seg.iit.nrc.ca/English/abstracts/

NRC44190.html.

Fagan, M. E. 1986. “Advances in Software Inspections.” IEEE Trans. on

Software Eng. SE-12 (7): 744–751. doi:10.1109/TSE.1986.6312976.

Harford, T. 2013. “Big Data: Are We Making a Big Mistake?” Financial

Times, March 28. http://

timharford.com/2014/04/big-data-are-we-making-a-big-mistake/

Humphrey, W. 1989. Managing the Software Process. Reading, MA:

Addison-Wesley.

IEEE. 2003. IEEE Software Engineering Standards Collection on CD-ROM. Los

Alamitos, CA: IEEE

Computer Society Press.

Ince, D. 1994. ISO 9001 and Software Quality Assurance. London: McGraw-

Hill.

Kitchenham, B. 1990. “Software Development Cost Models.” In Software

Reliability Handbook, edited by P. Rook, 487–517. Amsterdam: Elsevier.

McConnell, S. 2004. Code Complete: A Practical Handbook of Software

Construction, 2nd ed. Seat-tle, WA: Microsoft Press.

Menzies, T., and T. Zimmermann. 2013. “Software Analytics: So What?”

IEEE Software 30 (4): 31–37.

doi:10.1109/MS.2013.86.

Witten, I. H., E. Frank, and M. A. Hall. 2011. Data Mining: Practical

Machine Learning Tools and Techniques. Burlington, MA: Morgan

Kaufmann.

Zhang, D, S. Han, Y. Dang, J-G. Lou, H. Zhang, and T. Xie. 2013.

“Software Analytics in Practice.” IEEE

Software 30 (5): 30–37. doi:10.1109/MS.2013.94.

25

Configuration

management

Objectives

The objective of this chapter is to introduce you to software configuration

management processes and tools. When you have read the chapter,

you will:

know the essential functionality that should be provided by a

version control system, and how this is realized in centralized and

distributed systems;

understand the challenges of system building and the benefits of

continuous integration and system building;

understand why software change management is important and

the essential activities in the change management process;

understand the basics of software release management and how it

differs from version management.

Contents

25.1 Version management

25.2 System building

25.3 Change management

25.4 Release management

Chapter 25 Configuration management 731

Software systems are constantly changing during development and use.

Bugs are

discovered and have to be fixed. System requirements change, and you

have to

implement these changes in a new version of the system. New versions of

hardware

and system platforms are released, and you have to adapt your systems to

work with

them. Competitors introduce new features in their system that you have to

match. As

changes are made to the software, a new version of a system is created.

Most sys-

tems, therefore, can be thought of as a set of versions, each of which may

have to be maintained and managed.

Configuration management (CM) is concerned with the policies, processes,

and

tools for managing changing software systems (Aiello and Sachs 2011).

You need to

manage evolving systems because it is easy to lose track of what changes

and compo-

nent versions have been incorporated into each system version. Versions

implement

proposals for change, corrections of faults, and adaptations for different

hardware

and operating systems. Several versions may be under development and in

use at the

same time. If you don’t have effective configuration management

procedures in

place, you may waste effort modifying the wrong version of a system,

delivering the

wrong version of a system to customers, or forgetting where the software

source code for a particular version of the system or component is stored.

Configuration management is useful for individual projects as it is easy for

one

person to forget what changes have been made. It is essential for team

projects where several developers are working at the same time on a

software system. Sometimes

these developers are all working in the same place, but, increasingly,

development

teams are distributed with members in different locations across the

world. The con-

figuration management system provides team members with access to the

system

being developed and manages the changes that they make to the code.

The configuration management of a software system product involves four

closely related activities (Figure 25.1):

1. Version control This involves keeping track of the multiple versions of

system components and ensuring that changes made to components by

different developers do not interfere with each other.

2. System building This is the process of assembling program components,

data, and libraries, then compiling and linking these to create an

executable system.

3. Change management This involves keeping track of requests for changes

to delivered software from customers and developers, working out the

costs and

impact of making these changes, and deciding if and when the changes

should

be implemented.

4. Release management This involves preparing software for external

release and keeping track of the system versions that have been released

for customer use.

Because of the large volume of information to be managed and the

relationships

between configuration items, tool support is essential for configuration

manage-

ment. Configuration management tools are used to store versions of

system compo-

nents, build systems from these components, track the releases of system

versions to

732 Chapter 25 Configuration management

Change

proposals

System

building

Change

management

Component

System

System

versions

versions

releases

Figure 25.1

Version

Release

Configuration

management

management

management activities

customers, and keep track of change proposals. CM tools range from

simple tools

that support a single configuration management task, such as bug

tracking, to inte-

grated environments that support all configuration management activities.

Agile development, where components and systems are changed several

times a

day, is impossible without using CM tools. The definitive versions of

components

are held in a shared project repository, and developers copy them into

their own

workspace. They make changes to the code and then use system-building

tools to

create a new system on their own computer for testing. Once they are

happy with the

changes made, they return the modified components to the project

repository. This

makes the modified components available to other team members.

The development of a software product or custom software system takes

place in

three distinct phases:

1. A development phase where the development team is responsible for

managing the software configuration and new functionality is being added

to the software.

The development team decides on the changes to be made to the system.

2. A system testing phase where a version of the system is released

internally for testing. This may be the responsibility of a quality

management team or an individual or group within the development

team. At this stage, no new functionality

is added to the system. The changes made at this stage are bug fixes,

perfor-

mance improvements, and security vulnerability repairs. There may be

some

customer involvement as beta testers during this phase.

3. A release phase where the software is released to customers for use.

After the release has been distributed, customers may submit bug reports

and

change requests. New versions of the released system may be developed to

repair bugs and vulnerabilities and to include new features suggested by

customers.

For large systems, there is never just one “working” version of a system;

there are

always several versions of the system at different stages of development.

Several

Chapter 25 Configuration management 733

Releases

Development

Pre-release

versions

versions

R1.0

R1.1

Version 1

1

V1.0

V1.1

V1.2

V1.3

V1.4

V1.5

Version 2

2

V2.1

V2.2

V2.3

V2.4

Version 3

3

Figure 25.2

Multiversion system

teams may be involved in the development of different system versions.

Figure 25.2

development

shows situations where three versions of a system are being developed:

1. Version 1.5 of the system has been developed to repair bug fixes and

improve

the performance of the first release of the system. It is the basis of the

second

system release (R1.1).

2. Version 2.4 is being tested with a view to it becoming release 2.0 of the

system.

No new features are being added at this stage.

3. Version 3 is a development system where new features are being added

in

response to change requests from customers and the development team.

This

will eventually be released as release 3.0.

These different versions have many common components as well as

components or

component versions that are unique to that system version. The CM

system keeps track of the components that are part of each version and

includes them as required in the system build.

In large software projects, configuration management is sometimes part of

soft-

ware quality management (covered in Chapter 24). The quality manager is

responsi-

ble for both quality management and configuration management. When a

pre-release

version of the software is ready, the development team hands it over to

the quality

management team. The QM team checks that the system quality is

acceptable. If so,

it then becomes a controlled system, which means that all changes to the

system

have to be agreed on and recorded before they are implemented.

Many specialized terms are used in configuration management.

Unfortunately,

these are not standardized. Military software systems were the first

systems in which software CM was used, so the terminology for these

systems reflected the processes

and terminology used in hardware configuration management.

Commercial systems

developers did not know about military procedures or terminology and so

often

invented their own terms. Agile methods have also devised new

terminology in

order to distinguish the agile approach from traditional CM methods.

734 Chapter 25 Configuration management

Term

Explanation

Baseline

A collection of component versions that make up a system. Baselines

are controlled, which means that the component versions used in the

baseline cannot be changed. It is always possible to re-create a baseline

from its constituent components.

Branching

The creation of a new codeline from a version in an existing codeline.

The new codeline and the existing codeline may then develop

independently.

Codeline

A set of versions of a software component and other configuration items

on which that component depends.

Configuration (version) control

The process of ensuring that versions of systems and components are

recorded and maintained so that changes are managed and all versions

of components are identified and stored for the lifetime of the system.

Configuration item or software

Anything associated with a software project (design, code, test data,

configuration item (SCI)

document, etc.) that has been placed under configuration control.

Configuration items always have a unique identifier.

Mainline

A sequence of baselines representing different versions of a system.

Merging

The creation of a new version of a software component by merging

separate versions in different codelines. These codelines may have been

created by a previous branch of one of the codelines involved.

Release

A version of a system that has been released to customers (or other

users in an organization) for use.

Repository

A shared database of versions of software components and meta-

information about changes to these components.

System building

The creation of an executable system version by compiling and linking

the appropriate versions of the components and libraries making up the

system.

Version

An instance of a configuration item that differs, in some way, from other

instances of that item. Versions should always have a unique identifier.

Workspace

A private work area where software can be modified without affecting

other developers who may be using or modifying that software.

Figure 25.3 CM

terminology

The definition and use of configuration management standards are

essential for

quality certification in both ISO 9000 and the SEI’s capability maturity

model (Bamford and Deibler 2003; Chrissis, Konrad, and Shrum 2011).

CM standards in a company

may be based on generic standards such as IEEE 828-2012, an IEEE

standard for

configuration management. These standards focus on CM processes and

the docu-

ments produced during the CM process (IEEE 2012). Using the external

standards as a

starting point, companies may then develop more detailed, company-

specific standards that are tailored to their specific needs. However, agile

methods rarely use these standards because of the documentation

overhead involved.

25.1 Version management 735

25.1 Version management

Version management is the process of keeping track of different versions

of software components and the systems in which these components are

used. It also involves

ensuring that changes made by different developers to these versions do

not interfere with each other. In other words, version management is the

process of managing codelines and baselines.

Figure 25.4 illustrates the differences between codelines and baselines. A

codeline is a sequence of versions of source code, with later versions in the

sequence derived from earlier versions. Codelines normally apply to

components of systems so that there are different versions of each

component. A baseline is a definition of a specific system. The baseline

specifies the component versions that are included in the system and

identifies the libraries used, configuration files, and other system

information. In Figure 25.4, you can see that different baselines use

different versions of the components from each codeline. In the diagram, I

have shaded the boxes representing components in the baseline definition

to indicate that these are actually references to components in a codeline.

The mainline is a sequence of system versions developed from an original

baseline.

Baselines may be specified using a configuration language in which you

define

what components should be included in a specific version of a system. It is

possible to explicitly specify an individual component version (X.1.2, say)

or simply to specify the component identifier (X). If you simply include

the component identifier in the

configuration description, the most recent version of the component

should be used.

Baselines are important because you often have to re-create an individual

version

of a system. For example, a product line may be instantiated so that there

are specific system versions for each system customer. You may have to

re-create the version

delivered to a customer if they report bugs in their system that have to be

repaired.

Version control (VC) systems identify, store, and control access to the

different

versions of components. There are two types of modern version control

system:

1. Centralized systems, where a single master repository maintains all

versions of the software components that are being developed. Subversion

(Pilato, Collins-Sussman,

and Fitzpatrick 2008) is a widely used example of a centralized VC system.

2. Distributed systems, where multiple versions of the component repository

exist at the same time. Git (Loeliger and McCullough 2012), is a widely

used example of a distributed VC system.

Centralized and distributed VC systems provide comparable functionality

but

implement this functionality in different ways. Key features of these

systems include: 1. Version and release identification Managed versions of a

component are

assigned unique identifiers when they are submitted to the system. These

identi-

fiers allow different versions of the same component to be managed,

without

changing the component name. Versions may also be assigned attributes,

with

the set of attributes used to uniquely identify each version.

736 Chapter 25 Configuration management

Codeline (A)

Baseline - V1

A

A1.1

A1.2

A1.3

A

B1.2

C1.1

Codeline (B)

L1

L2

Ex1

B

B1.1

B1.2

B1.3

Baseline - V2

Codeline (C)

A1.3

B1.2

C1.2

C

C1.1

C1.2

C1.3

L1

L2

Ex2

Libraries and external components

Figure 25.4 Codelines

L1

L2

Ex1

Ex2

Mainline

and baselines

2. Change history recording The VC system keeps records of the changes

that have been made to create a new version of a component from an

earlier version. In

some systems, these changes may be used to select a particular system

version.

This involves tagging components with keywords describing the changes

made.

You then use these tags to select the components to be included in a

baseline.

3. Independent development Different developers may be working on the

same

component at the same time. The version control system keeps track of

compo-

nents that have been checked out for editing and ensures that changes

made to a

component by different developers do not interfere.

4. Project support A version control system may support the development

of several projects, which share components. It is usually possible to check

in and

check out all of the files associated with a project rather than having to

work

with one file or directory at a time.

5. Storage management Rather than maintain separate copies of all versions

of a component, the version control system may use efficient mechanisms

to ensure

that duplicate copies of identical files are not maintained. Where there are

only

small differences between files, the VC system may store these differences

rather than maintain multiple copies of files. A specific version may be

auto-

matically re-created by applying the differences to a master version.

Most software development is a team activity, so several team members

often

work on the same component at the same time. For example, let’s say

Alice is mak-

ing some changes to a system, which involves changing components A, B,

and C. At

the same time, Bob is working on changes that require making changes to

compo-

nents X, Y, and C. Both Alice and Bob are therefore changing C. It’s

important to

avoid situations where changes interfere with each other—Bob’s changes

to C over-

writing Alice’s or vice versa.

To support independent development without interference, all version

control

systems use the concept of a project repository and a private workspace.

The project repository maintains the “master” version of all components,

which is used to create baselines for system building. When modifying

components, developers copy

25.1 Version management 737

Workspace (Alice)

Workspace (Bob)

A1.0

B1.0

C1.0

X1.0

Y1.0

C1.0

A1.1

B1.1

C1.1

X1.1

Y1.1

C1.1

Alice

Bob

check_out

check_in check_out

check_in

A1.0

B1.0

C1.0

X1.0

Y1.0

P1.0

Z1.0

A1.1

B1.1

C1.1

X1.1

Y1.1

Q1.0

Figure 25.5 Check-in

C1.2

R1.0

and check-out from a

centralized version

Version management system

repository

(check-out) these from the repository into their workspace and work on

these copies.

When they have completed their changes, the changed components are

returned

(checked-in) to the repository. However, centralized and distributed VC

systems

support independent development of shared components in different ways.

In centralized systems, developers check out components or directories of

com-

ponents from the project repository into their private workspace and work

on these

copies in their private workspace. When their changes are complete, they

check-in

the components back to the repository. This creates a new component

version that

may then be shared. For an illustration, see Figure 25.5.

Here, Alice has checked out versions A1.0, B1.0, and C1.0. She has worked

on these

versions and has created new versions A1.1, B1.1, and C1.1. She checks

these new versions into the repository. Bob checks out X1.0, Y1.0, and

C1.0. He creates new versions of these components and checks them back

in to the repository. However, Alice has

already created a new version of C, while Bob has been working on it. His

check-in

therefore creates another version C1.2, so that Alice’s changes are not

overwritten.

If two or more people are working on a component at the same time, each

must

check out the component from the repository. If a component has been

checked out,

the version control system warns other users wanting to check out that

component that it has been checked out by someone else. The system will

also ensure that when the

modified components are checked in, the different versions are assigned

different

version identifiers and are stored separately.

In a distributed VC system, such as Git, a different approach is used. A

“master”

repository is created on a server that maintains the code produced by the

development team. Instead of simply checking out the files that they need,

a developer creates a clone of the project repository that is downloaded

and installed on his or her computer.

Developers work on the files required and maintain the new versions on

their

private repository on their own computer. When they have finished

making

changes, they “commit” these changes and update their private server

repository.

They may then “push” these changes to the project repository or tell the

integra-

tion manager that changed versions are available. He or she may then

“pull” these

files to the project repository (see Figure 25.6). In this example, both Bob

and

Alice have cloned the project repository and have updated files. They have

not yet

pushed these back to the project repository.

738 Chapter 25 Configuration management

A1.1

B1.1

C1.1

A1.0

B1.0

C1.0

X1.0

Y1.0

Z1.0

Q1.0

R1.0

P1.0

Alice

Alice’s repository

clone

A1.0

B1.0

C1.0

X1.0

Y1.0

Z1.0

Q1.0

R1.0

P1.0

Master repository

clone

C1.1

X1.1

Y1.1

A1.0

B1.0

C1.0

X1.0

Y1.0

Z1.0

Q1.0

R1.0

P1.0

Figure 25.6 Repository

Bob

Bob’s repository

cloning

This model of development has a number of advantages:

1. It provides a backup mechanism for the repository. If the repository is

corrupted, work can continue and the project repository can be restored

from local copies.

2. It allows for offline working so that developers can commit changes if

they do

not have a network connection.

3. Project support is the default way of working. Developers can compile

and test

the entire system on their local machines and test the changes they have

made.

Distributed version control is essential for open-source development where

several

people may be working simultaneously on the same system without any

central coordination. There is no way for the open-source system

“manager” to know when changes will be made. In this case, as well as a

private repository on their own computer, developers also maintain a

public server repository to which they push new versions of components

that they have changed. It is then up to the open-source system “manager”

to decide when to pull these changes into the definitive system. This

organization is shown in Figure 25.7.

In this example, Charlie is the integration manager for the open-source

system.

Alice and Bob work independently on system development and clone the

definitive

project repository (1). As well as their private repositories, both Alice and

Bob

maintain a public repository on a server than can be accessed by Charlie.

When

they have made and tested changes, they push the changed versions from

their pri-

vate repositories to their personal public repositories and tell Charlie that

these

repositories are available (2). Charlie pulls these from their repositories

into his

25.1 Version management 739

Definitive project

Alice’s public

Bob’s public

repository

repository

repository

1

3

3

4

2

2

Charlie’s private

Alice’s private

Bob’s private

repository

repository

repository

Figure 25.7 Open-

source development

Charlie

Alice

Bob

own private repository for testing (3). Once he is satisfied that the changes

are

acceptable, he then updates the definitive project repository (4).

A consequence of the independent development of the same component is

that

codelines may branch. Rather than a linear sequence of versions that

reflect changes to the component over time, there may be several

independent sequences, as shown

in Figure 25.8. This is normal in system development, where different

developers

work independently on different versions of the source code and change it

in differ-

ent ways. It is generally recommended when working on a system that a

new branch

should be created so that changes do not accidentally break a working

system.

At some stage, it may be necessary to merge codeline branches to create a

new version of a component that includes all changes that have been

made. This is also shown in

Figure 25.8, where component versions 2.1.2 and 2.3 are merged to create

version 2.4. If the changes made involve completely different parts of the

code, the component versions may be merged automatically by the version

control system by combining the code

changes. This is the normal mode of operation when new features have

been added. These code changes are merged into the master copy of the

system. However, the changes made by different developers sometimes

overlap. The changes may be incompatible and interfere with each other.

In this case, a developer has to check for clashes and make changes to the

components to resolve the incompatibilities between the different

versions.

When version control systems were first developed, storage management

was one

of their most important functions. Disk space was expensive, and it was

important to Codeline 2.1

<branch>

V2.1.1

V2.1.2

Codeline 2

<merge>

V2.0

V2.1

V2.4

V2.2

V2.3

<branch>

V1.0

V1.1

V1.2

Figure 25.8 Branching

and merging

Codeline 1

740 Chapter 25 Configuration management

Creation date

Version sequence

Version

Version

Version

Version

1.0

1.1

1.2

1.3

Most recent

V1.3 source

D1

D2

D3

Figure 25.9 Storage

code

management using

deltas

Storage structure

minimize the disk space used by the different copies of components.

Instead of keeping a complete copy of each version, the system stores a list

of differences (deltas) between one version and another. By applying these

to a master version (usually the most recent version), a target version can

be re-created. This is illustrated in Figure 25.9.

When a new version is created, the system simply stores a delta, a list of

differ-

ences, between the new version and the older version used to create that

new ver-

sion. In Figure 25.9, the shaded boxes represent earlier versions of a

component that are automatically re-created from the most recent

component version. Deltas are

usually stored as lists of changed lines, and, by applying these

automatically, one

version of a component can be created from another. As the most recent

version of a

component will most likely be the one used, most systems store that

version in full.

The deltas then define how to re-create earlier system versions.

One of the problems with a delta-based approach to storage management

is that it can take a long time to apply all of the deltas. As disk storage is

now relatively cheap, Git uses an alternative, faster approach. Git does not

use deltas but applies a standard compression algorithm to stored files and

their associated meta-information. It does not store duplicate copies of

files. Retrieving a file simply involves decompressing it, with no need to

apply a chain of operations. Git also uses the notion of packfiles where

several smaller files are combined into an indexed single file. This reduces

the overhead associated with lots of small files. Deltas are used within

packfiles to further reduce their size.

25.2 System building

System building is the process of creating a complete, executable system

by compiling and linking the system components, external libraries,

configuration files, and other information. System-building tools and

version control tools must be integrated as the build process takes

component versions from the repository managed by the version

control system.

System building involves assembling a large amount of information about

the soft-

ware and its operating environment. Therefore, it always makes sense to

use an auto-

mated build tool to create a system build (Figure 25.10). Notice that you

don’t just need the source code files that are involved in the build. You

may have to link these with externally provided libraries, data files (such

as a file of error messages), and configuration files that define the target

installation. You may have to specify the versions of

25.2 System building 741

Source

Configuration

Executable

code files

files

tests

Automated

Data files

Executable

build system

target system

Libraries

Compilers

Test results

Figure 25.10 System

and tools

building

the compiler and other software tools that are to be used in the build.

Ideally, you should be able to build a complete system with a single

command or mouse click.

Tools for system integration and building include some or all of the

following features: 1. Build script generation The build system should

analyze the program that is being built, identify dependent components,

and automatically generate a build

script (configuration file). The system should also support the manual

creation

and editing of build scripts.

2. Version control system integration The build system should check out the

required versions of components from the version control system.

3. Minimal recompilation The build system should work out what source

code

needs to be recompiled and set up compilations if required.

4. Executable system creation The build system should link the compiled

object code files with each other and with other required files, such as

libraries and

configuration files, to create an executable system.

5. Test automation Some build systems can automatically run automated

tests using test automation tools such as JUnit. These check that the build

has not

been “broken” by changes.

6. Reporting The build system should provide reports about the success or

failure of the build and the tests that have been run.

7. Documentation generation The build system may be able to generate

release notes about the build and system help pages.

The build script is a definition of the system to be built. It includes

information about components and their dependencies, and the versions of

tools used to compile and link the system. The configuration language

used to define the build script includes constructs to describe the system

components to be included in the build and their dependencies.

Building is a complex process, which is potentially error-prone, as three

different

system platforms may be involved (Figure 25.11):

1. The development system, which includes development tools such as

compilers and source code editors. Developers check out code from the

version control system into

742 Chapter 25 Configuration management

Development system

Target system

Development

Executable system

tools

Private workspace

Target platform

Check-in

Version management and build server

Check-out

(co)

Version

co

management

Build server

Figure 25.11

system

Development, build, and

target platforms

a private workspace before making changes to the system. They may wish

to build a

version of a system for testing in their development environment before

committing

changes that they have made to the version control system. This involves

using local build tools that use checked-out versions of components in the

private workspace.

2. The build server, which is used to build definitive, executable versions of

the system. This server maintains the definitive versions of a system. All of

the

system developers check in code to the version control system on the build

server for system building.

3. The target environment, which is the platform on which the system

executes. This may be the same type of computer that is used for the

development and build systems. However, for real-time and embedded

systems, the target environment is often

smaller and simpler than the development environment (e.g., a cell

phone). For large systems, the target environment may include databases

and other application systems

that cannot be installed on development machines. In these situations, it is

not possible to build and test the system on the development computer or

on the build server.

Agile methods recommend that very frequent system builds should be

carried

out, with automated testing used to discover software problems. Frequent

builds are

part of a process of continuous integration as shown in Figure 25.12. In

keeping with the agile methods notion of making many small changes,

continuous integration

involves rebuilding the mainline frequently, after small source code

changes have

been made. The steps in continuous integration are:

1. Extract the mainline system from the VC system into the developer’s

private

workspace.

2. Build the system and run automated tests to ensure that the built system

passes

all tests. If not, the build is broken, and you should inform whoever

checked in

the last baseline system. He or she is responsible for repairing the

problem.

3. Make the changes to the system components.

4. Build the system in a private workspace and rerun system tests. If the

tests fail, continue editing.

25.2 System building 743

Version

Private

management

workspace

system

Tests fail

Check-out

Build and

Make

Build and

Tests fail

mainline

test system

changes

test system

Tests OK

Check-in to

Build and

OK

Commit

build server

test system

changes to VM

Version

Build server

management

Figure 25.12

system

Continuous integration

5. Once the system has passed its tests, check it into the build system

server but do not commit it as a new system baseline in the VC system.

6. Build the system on the build server and run the tests. Alternatively, if

you are using Git, you can pull recent changes from the server to your

private workspace. You need to do this in case others have modified

components since you

checked out the system. If this is the case, check out the components that

have

failed and edit these so that tests pass on your private workspace.

7. If the system passes its tests on the build system, then commit the

changes you

have made as a new baseline in the system mainline.

Tools such as Jenkins (Smart 2011) are used to support continuous

integration.

These tools can be set up to build a system as soon as a developer has

completed a

repository update.

The advantage of continuous integration is that it allows problems caused

by the

interactions between different developers to be discovered and repaired as

soon as

possible. The most recent system in the mainline is the definitive working

system.

However, although continuous integration is a good idea, it is not always

possible to implement this approach to system building:

1. If the system is very large, it may take a long time to build and test,

especially if integration with other application systems is involved. It may

be impractical to

build the system being developed several times per day.

2. If the development platform is different from the target platform, it may

not be possible to run system tests in the developer’s private workspace.

There may be

differences in hardware, operating system, or installed software.

Therefore,

more time is required for testing the system.

For large systems or for systems where the execution platform is not the

same as

the development platform, continuous integration is usually impossible. In

those

circumstances, frequent system building is supported using a daily build

system:

744 Chapter 25 Configuration management

1. The development organization sets a delivery time (say 2 p.m.) for

system com-

ponents. If developers have new versions of the components that they are

writ-

ing, they must deliver them by that time. Components may be incomplete

but

should provide some basic functionality that can be tested.

2. A new version of the system is built from these components by

compiling and

linking them to form a complete system.

3. This system is then delivered to the testing team, which carries out a set

of predefined system tests.

4. Faults that are discovered during system testing are documented and

returned to the system developers. They repair these faults in a subsequent

version of the component.

The advantages of using frequent builds of software are that the chances

of

finding problems stemming from component interactions early in the

process are

increased. Frequent building encourages thorough unit testing of

components.

Psychologically, developers are put under pressure not to “break the

build”; that

is, they try to avoid checking in versions of components that cause the

whole sys-

tem to fail. They are therefore reluctant to deliver new component

versions that

have not been properly tested. Consequently, less time is spent during

system

testing discovering and coping with software faults that could have been

found by

the developer.

As compilation is a computationally intensive process, tools to support

system

building may be designed to minimize the amount of compilation that is

required.

They do this by checking if a compiled version of a component is

available. If so, there is no need to recompile that component. Therefore,

there has to be a way of unambiguously linking the source code of a

component with its equivalent object code.

This linking is accomplished by associating a unique signature with each

file

where a source code component is stored. The corresponding object code,

which has

been compiled from the source code, has a related signature. The

signature identifies each source code version and is changed when the

source code is edited. By comparing the signatures on the source and

object code files, it is possible to decide if the source code component was

used to generate the object code component.

Two types of signature may be used, as shown in Figure 25.13:

1. Modification timestamps The signature on the source code file is the time

and date when that file was modified. If the source code file of a

component has

been modified after the related object code file, then the system assumes

that

recompilation to create a new object code file is necessary.

For example, say components Comp.java and Comp.class have

modification

signatures of 17:03:05:02:14:2014 and 16:58:43:02:14:2014, respectively.

This

means that the Java code was modified at 3 minutes and 5 seconds past 5

on the

14th of February 2014 and the compiled version was modified at 58

minutes

and 43 seconds past 4 on the 14th of February 2014. In this case, the

system

would automatically recompile Comp.java because the compiled version

has an

earlier modification date than the most recent version of the component.

25.3 Change management 745

Timestamp

Checksum

Checksum

16583102142014

24374509887231

24374509887231

Comp.java

Comp.java

Comp.class

(V1.0)

(V1.0)

Compile

Compile

Timestamp

Timestamp

Checksum

Checksum

17030502142014

16584302142014

37650812555734

37650812555734

Comp.java

Comp.java

Comp.class

Comp.class

(V1.1)

(V1.1)

Compile

Time-based identification

Checksum-based identification

Figure 25.13 Linking

source and object

2. Source code checksums The signature on the source code file is a

checksum calcu-code

lated from data in the file. A checksum function calculates a unique

number using

the source text as input. If you change the source code (even by one

character), this will generate a different checksum. You can therefore be

confident that source code

files with different checksums are actually different. The checksum is

assigned to

the source code just before compilation and uniquely identifies the source

file. The build system then tags the generated object code file with the

checksum signature.

If there is no object code file with the same signature as the source code

file to be included in a system, then recompilation of the source code is

necessary.

As object code files are not normally versioned, the first approach means

that only

the most recently compiled object code file is maintained in the system.

This is normally related to the source code file by name; that is, it has the

same name as the

source code file but with a different suffix. Therefore, the source file

Comp.Java may generate the object file Comp.class. Because source and

object files are linked by

name, it is not usually possible to build different versions of a source code

component into the same directory at the same time. The compiler would

generate object files

with the same name, so only the most recently compiled version would be

available.

The checksum approach has the advantage of allowing many different

versions of

the object code of a component to be maintained at the same time. The

signature

rather than the filename is the link between source and object code. The

source code and object code files have the same signature. Therefore,

when you recompile a

component, it does not overwrite the object code, as would normally be

the case

when the timestamp is used. Rather, it generates a new object code file

and tags it

with the source code signature. Parallel compilation is possible, and

different ver-

sions of a component may be compiled at the same time.

25.3 Change management

Change is a fact of life for large software systems. Organizational needs

and requirements change during the lifetime of a system, bugs have to be

repaired, and systems

have to adapt to changes in their environment. To ensure that the changes

are applied

746 Chapter 25 Configuration management

Customer support

Customer

Submit

Check CR

CR

Invalid

Valid

Change

Close CR

Register CR

requests

Development

Implementation

Product development/CCB

analysis

Cost/impact

Assess CRs

analysis

Select CRs

Modify

software

Test software

Close CRs

Pass

Fail

Figure 25.14 The

change management

Close CR

process

to the system in a controlled way, you need a set of tool-supported,

change manage-

ment processes. Change management is intended to ensure that the

evolution of the

system is controlled and that the most urgent and cost-effective changes

are prioritized.

Change management is the process of analyzing the costs and benefits of

pro-

posed changes, approving those changes that are cost-effective, and

tracking which

components in the system have been changed. Figure 25.14 is a model of

a change

management process that shows the main change management activities.

This pro-

cess should come into effect when the software is handed over for release

to custom-

ers or for deployment within an organization.

Many variants of this process are in use depending on whether the

software is a cus-

tom system, a product line, or an off-the-shelf product. The size of the

company also makes a difference—small companies use a less formal

process than large companies

that are working with corporate or government customers. However, all

change manage-

ment processes should include some way of checking, costing, and

approving changes.

Tools to support change management may be relatively simple issue or

bug track-

ing systems or software that is integrated with a configuration

management package

for large-scale systems, such as Rational Clearcase. Issue tracking systems

allow anyone to report a bug or make a suggestion for a system change,

and they keep track of how the development team has responded to the

issues. These systems do not impose

a process on the users and so can be used in many different settings. More

complex

systems are built around a process model of the change management

process. They

25.3 Change management 747

Change Request Form

Project: SICSA/AppProcessing

Number: 23/02

Change requester: I. Sommerville

Date: 20/07/12

Requested change: The status of applicants (rejected, accepted, etc.)

should be shown visually in the displayed list of applicants.

Change analyzer: R. Looek

Analysis date: 25/07/12

Components affected: ApplicantListDisplay, StatusUpdater

Associated components: StudentDatabase

Change assessment: Relatively simple to implement by changing the

display color

according to status. A table must be added to relate status to colors. No

changes to associated components are required.

Change priority: Medium

Change implementation:

Estimated effort: 2 hours

Date to SGA app. team: 28/07/12

CCB decision date: 30/07/12

Decision: Accept change. Change to be implemented in Release 1.2

Change implementor:

Date of change:

Date submitted to QM:

QM decision:

Date submitted to CM:

Figure 25.15 A partially

Comments:

completed change

request form

automate the entire process of handling change requests from the initial

customer

proposal to final change approval and change submission to the

development team.

The change management process is initiated when a system stakeholder

completes and

submits a change request describing the change required to the system.

This could be a bug report, where the symptoms of the bug are described,

or a request for additional functionality to be added to the system. Some

companies handle bug reports and new requirements separately, but, in

principle, both are simply change requests. Change requests may be

submitted using a change request form (CRF). Stakeholders may be system

owners

and users, beta testers, developers, or the marketing department of a

company.

Electronic change request forms record information that is shared between

all

groups involved in change management. As the change request is

processed, infor-

mation is added to the CRF to record decisions made at each stage of the

process. At any time, it therefore represents a snapshot of the state of the

change request. In

addition to recording the change required, the CRF records the

recommendations

regarding the change, the estimated costs of the change, and the dates

when the

change was requested, approved, implemented, and validated. The CRF

may also

include a section where a developer outlines how the change may be

implemented.

Again, the degree of formality in the CRF varies depending on the size and

type of

organization that is developing the system.

Figure 25.15 is an example of a type of CRF that might be used in a large

com-

plex systems engineering project. For smaller projects, I recommend that

change

requests should be formally recorded; the CRF should focus on describing

the

748 Chapter 25 Configuration management

Customers and changes

Agile methods emphasize the importance of involving customers in the

change prioritization process. The customer representative helps the team

decide on the changes that should be implemented in the next

development iteration. While this can be effective for systems that are in

development for a single customer, it can be a problem in product

development where no real customer is working with the team. In those

cases, the team has to make its own decisions on change prioritization.

http://software-engineering-book.com/web/agile-changes/

change required, with less emphasis on implementation issues. System

developers

decide how to implement the change and estimate the time required to

complete

the change implementation.

After a change request has been submitted, it is checked to ensure that it

is valid.

The checker may be from a customer or application support team or, for

internal

requests, may be a member of the development team. The change request

may be

rejected at this stage. If the change request is a bug report, the bug may

have already been reported and repaired. Sometimes, what people believe

to be problems are actually misunderstandings of what the system is

expected to do. On occasions, people

request features that have already been implemented but that they don’t

know about.

If any of these features are true, the issue is closed and the form is updated

with the reason for closure. If it is a valid change request, it is then logged

as an outstanding request for subsequent analysis.

For valid change requests, the next stage of the process is change

assessment and

costing. This function is usually the responsibility of the development or

mainte-

nance team as they can work out what is involved in implementing the

change. The

impact of the change on the rest of the system must be checked. To do

this, you have to identify all of the components affected by the change. If

making the change means that further changes elsewhere in the system

are needed, this will obviously increase the cost of change

implementation. Next, the required changes to the system modules are

assessed. Finally, the cost of making the change is estimated, taking into

account the costs of changing related components.

Following this analysis, a separate group decides if it is cost-effective for

the

business to make the change to the software. For military and government

systems,

this group is often called the change control board (CCB). In industry, it

may be

called something like a “product development group” responsible for

making deci-

sions about how a software system should evolve. This group should

review and

approve all change requests, unless the changes simply involve correcting

minor

errors on screen displays, web pages, or documents. These small requests

should be

passed to the development team for immediate implementation.

The CCB or product development group considers the impact of the

change from

a strategic and organizational rather than a technical point of view. It

decides whether the change in question is economically justified, and it

prioritizes accepted changes for implementation. Accepted changes are

passed back to the development group;

25.3 Change management 749

rejected change requests are closed and no further action is taken. The

factors that influence the decision on whether or not to implement a

change include:

1. The consequences of not making the change When assessing a change

request, you have to consider what will happen if the change is not

implemented. If the

change is associated with a reported system failure, the seriousness of that

fail-

ure has to be taken into account. If the system failure causes the system to

crash,

this is very serious, and failure to make the change may disrupt the

operational

use of the system. On the other hand, if the failure has a minor effect, such

as

incorrect colors on a display, then it is not important to fix the problem

quickly.

The change should therefore have a low priority.

2. The benefits of the change Will the change benefit many users of the

system, or will it only benefit the change proposer?

3. The number of users affected by the change If only a few users are

affected, then the change may be assigned a low priority. In fact, making

the change may be

inadvisable if it means that the majority of system users have to adapt to

it.

4. The costs of making the change If making the change affects many system

components (hence increasing the chances of introducing new bugs) and/

or takes a

lot of time to implement, then the change may be rejected.

5. The product release cycle If a new version of the software has just been

released to customers, it may make sense to delay implementation of the

change until the

next planned release (see Section 25.4).

Change management for software products (e.g., a CAD system product),

rather than

custom systems specifically developed for a certain customer, are handled

in a different way. In software products, the customer is not directly

involved in decisions about system evolution, so the relevance of the

change to the customer’s business is not an issue.

Change requests for these products come from the customer support team,

the company

marketing team, and the developers themselves. These requests may

reflect suggestions and feedback from customers or analyses of what is

offered by competing products.

The customer support team may submit change requests associated with

bugs that

have been discovered and reported by customers after the software has

been released.

Customers may use a web page or email to report bugs. A bug

management team then

checks that the bug reports are valid and translates them into formal

system change

requests. Marketing staff may meet with customers and investigate

competitive products.

They may suggest changes that should be included to make it easier to sell

a new version of a system to new and existing customers. The system

developers themselves may have some good ideas about new features that

can be added to the system.

The change request process shown in Figure 25.14 is initiated after a

system has

been released to customers. During development, when new versions of

the system

are created through daily (or more frequent) system builds, there is no

need for a

formal change management process. Problems and requested changes are

recorded

in an issue tracking system and discussed in daily meetings. Changes that

only affect individual components are passed directly to the system

developer, who either

accepts them or makes a case for why they are not required. However, an

independent

750 Chapter 25 Configuration management

// SICSA project (XEP 6087)

//

// APP-SYSTEM/AUTH/RBAC/USER_ROLE

//

// Object: currentRole

// Author: R. Looek

// Creation date: 13/11/2012

//

// © St Andrews University 2012

//

// Modification history

// Version

Modifier

Date

Change

Reason

// 1.0

J. Jones

11/11/2009

Add header

Submitted to CM

// 1.1

R. Looek

13/11/2009

New field

Change req. R07/02

Figure 25.16

Derivation history

authority, such as the system architect, should assess and prioritize

changes that cut across system modules that have been produced by

different development teams.

In some agile methods, customers are directly involved in deciding

whether a change

should be implemented. When they propose a change to the system

requirements, they

work with the team to assess the impact of that change and then decide

whether the

change should take priority over the features planned for the next

increment of the system. However, changes that involve software

improvement are left to the discretion of the programmers working on the

system. Refactoring, where the software is continually improved, is not

seen as an overhead but as a necessary part of the development process.

As the development team changes software components, they should

maintain a

record of the changes made to each component. This is sometimes called

the derivation history of a component. A good way to keep the derivation

history is in a standardized comment at the beginning of the component

source code (Figure 25.16). This comment should reference the change

request that triggered the software change. These

comments can be processed by scripts that scan all components for the

derivation histories and then generate component change reports. For

documents, records of changes incorporated in each version are usually

maintained in a separate page at the front of the document. I discuss this

in the web chapter on documentation (Chapter 30).

25.4 Release management

A system release is a version of a software system that is distributed to

customers.

For mass-market software, it is usually possible to identify two types of

release:

major releases, which deliver significant new functionality, and minor

releases,

which repair bugs and fix customer problems that have been reported. For

example,

this book is being written on an Apple Mac computer where the operating

system is

OS 10.9.2. This means minor release 2 of major release 9 of OS 10. Major

releases

are very important economically to the software vendor, as customers

usually have

to pay for them. Minor releases are usually distributed free of charge.

25.4 Release management 751

A software product release is not just the executable code of the system.

The

release may also include:

configuration files defining how the release should be configured for

particular

installations;

data files, such as files of error messages in different languages, that are

needed for successful system operation;

an installation program that is used to help install the system on target

hardware;

electronic and paper documentation describing the system;

packaging and associated publicity that have been designed for that

release.

Preparing and distributing a system release for mass-market products is an

expensive process. In addition to the technical work involved in creating a

release distribution, advertising and publicity material have to be

prepared. Marketing strategies may have to be designed to convince

customers to buy the new release of the system. Careful

thought must be given to release timing. If releases are too frequent or

require hardware upgrades, customers may not move to the new release,

especially if they have to pay for it. If system releases are infrequent,

market share may be lost as customers move to alternative systems.

The various technical and organizational factors that you should take into

account

when deciding on when to release a new version of a software product are

shown in

Figure 25.17.

Release creation is the process of creating the collection of files and

documentation that include all components of the system release. This

process involves several steps: 1. The executable code of the programs and

all associated data files must be identified in the version control system

and tagged with the release identifier.

2. Configuration descriptions may have to be written for different

hardware and

operating systems.

3. Updated instructions may have to be written for customers who need to

config-

ure their own systems.

4. Scripts for the installation program may have to be written.

5. Web pages have to be created describing the release, with links to

system

documentation.

6. Finally, when all information is available, an executable master image

of the

software must be prepared and handed over for distribution to customers

or

sales outlets.

For custom software or software product lines, the complexity of the

system release

management process depends on the number of system customers. Special

releases

of the system may have to be produced for each customer. Individual

customers

may be running several different releases of the system at the same time

on differ-

ent hardware. Where the software is part of a complex system of systems,

several

752 Chapter 25 Configuration management

Factor

Description

Competition

For mass-market software, a new system release may be necessary because

a

competing product has introduced new features and market share may be

lost if

these are not provided to existing customers.

Marketing requirements

The marketing department of an organization may have made a

commitment for

releases to be available at a particular date. For marketing reasons, it may

be

necessary to include new features in a system so that users can be

persuaded to

upgrade from a previous release.

Platform changes

You may have to create a new release of a software application when a

new

version of the operating system platform is released.

Technical quality of the

If serious system faults are reported that affect the way in which many

customers

system

use the system, it may be necessary to correct them in a new system

release.

Minor system faults may be repaired by issuing patches, distributed over

the

Internet, which can be applied to the current release of the system.

Figure 25.17 Factors

influencing system

different variants of the individual systems may have to be created. For

example, in release planning

specialized fire-fighting vehicles, each type of vehicle may have its own

version of a software system that is adapted to the equipment in that

vehicle.

A software company may have to manage tens or even hundreds of

different

releases of their software. Their configuration management systems and

processes

have to be designed to provide information about which customers have

which

releases of the system and the relationship between releases and system

versions. In the event of a problem with a delivered system, you have to

be able to recover all of the component versions used in that specific

system.

Therefore, when a system release is produced, it must be documented to

ensure

that it can be re-created exactly in the future. This is particularly

important for customized, long-lifetime embedded systems, such as

military systems and those that

control complex machines. These systems may have a long lifetime—30

years in

some cases. Customers may use a single release of these systems for many

years and

may require specific changes to that release long after it has been

superseded.

To document a release, you have to record the specific versions of the

source

code components that were used to create the executable code. You must

keep cop-

ies of the source code files, corresponding executables, and all data and

configura-

tion files. It may be necessary to keep copies of older operating systems

and other

support software because they may still be in operational use. Fortunately,

this no

longer means that old hardware always has to be maintained. The older

operating

systems can run in a virtual machine.

You should also record the versions of the operating system, libraries,

compilers,

and other tools used to build the software. These tools may be required in

order to

build exactly the same system at some later date. Accordingly, you may

have to store copies of the platform software and the tools used to create

the system in the version control system, along with the source code of the

target system.

When planning the installation of new system releases, you cannot assume

that cus-

tomers will always install new system releases. Some system users may be

happy with

Chapter 25 Key Points 753

an existing system and may not consider it worthwhile to absorb the cost

of changing to a new release. New releases of the system cannot,

therefore, rely on the installation of previous releases. To illustrate this

problem, consider the following scenario:

1. Release 1 of a system is distributed and put into use.

2. Release 2 requires the installation of new data files, but some customers

do not need the facilities of release 2 and so remain with release 1.

3. Release 3 requires the data files installed in release 2 and has no new

data files of its own.

The software distributor cannot assume that the files required for release 3

have

already been installed in all sites. Some sites may go directly from release

1 to

release 3, skipping release 2. Some sites may have modified the data files

associated with release 2 to reflect local circumstances. Therefore, the data

files must be distributed and installed with release 3 of the system.

One benefit of delivering software as a service (SaaS) is that it avoids all

of these problems. It simplifies both release management and system

installation for customers.

The software developer is responsible for replacing the existing release of

a system with a new release, which is made available to all customers at

the same time. However, this approach requires that all servers running

the services be updated at the same time.

To support server updates, specialized distribution management tools such

as Puppet

(Loope 2011) have been developed for “pushing” new software to servers.

K e y P o i n t s

Configuration management is the management of an evolving software

system. When maintaining a system, a CM team is put in place to ensure

that changes are incorporated into the system in a controlled way and that

records are maintained with details of the changes that have been

implemented.

The main configuration management processes are concerned with

version control, system building, change management, and release

management. Software tools are available to support all of these processes.

Version control involves keeping track of the different versions of

software components that are created as changes are made to them.

System building is the process of assembling system components into an

executable program to run on a target computer system.

Software should be frequently rebuilt and tested immediately after a

new version has been built. This makes it easier to detect bugs and

problems that have been introduced since the last build.

Change management involves assessing proposals for changes from

system customers and other stakeholders and deciding if it is cost-effective

to implement these changes in a new release of a system.

754

754 Chapter 25 Configuration

Configuration management

System releases include executable code, data files, configuration files,

and documentation.

Release management involves making decisions on system release dates,

preparing all

information for distribution and documenting each system release.

F u R t h e R R e a d i n g

Software Configuration Management Patterns: Effective Teamwork, Practical

Integration. A relatively short, easy-to-read book that gives good practical

advice on configuration management practice, especially for agile methods

of development. (S. P. Berczuk with B. Appleton, Addison-Wesley, 2003).

“Agile Configuration Management for Large Organizations.” This web

article describes configuration management practices that can be used in

agile development processes, with a particular emphasis

on how these can scale to large projects and companies. (P. Schuh, 2007).

http://www.ibm.com/

developerworks/rational/library/mar07/schuh/index.html

Configuration Management Best Practices This is a nicely written book that

presents a broader view of configuration management than I have

discussed here, including hardware configuration management. It’s geared

to large systems projects and does not really cover agile development

issues. (Bob Aiello and Leslie Sachs, Addison-Wesley, 2011).

“A Behind the Scenes Look at Facebook Release Engineering.” This is an

interesting article that covers the problems of releasing new versions of

large systems in the cloud, something that I haven’t discussed in this

chapter. The challenge here is to make sure that all of the servers are

updated at the same time so

that users don’t see different versions of the system. (P. Ryan,

arstechnica.com, 2012). http://arstechnica.

com/business/2012/04/exclusive-a-behind-the-scenes-look-at-facebook-

release-engineering/

“Git SVn Comparison.” This wiki compares the Git and Subversion version

control systems. (2013,

https://git.wiki.kernel.org/index.php/GitSvnComparsion).

W e b S i t e

PowerPoint slides for this chapter:

www.pearsonglobaleditions.com/Sommerville

Links to supporting videos:

http://software-engineering-book.com/videos/software-management/

e x e R C i S e S

25.1. Suggest five possible problems that could arise if a company does

not develop effective configuration management policies and processes.

25.2. In version management, what do codeline and baseline

terminologies stand for? List the features included in a version control

system.

25.4 Release

Chapter 25 management

References 755

25.3. Imagine a situation where two developers are simultaneously

modifying three different software components. What difficulties might

arise when they try to merge the changes they have made?

25.4. Software is now often developed by distributed teams, with team

members working at different locations and in different time zones.

Suggest features in a version control system that could be included to

support distributed software development.

25.5. Describe the difficulties that may arise when building a system from

its components. What particular problems might occur when a system is

built on a host computer for some target machine?

25.6. With reference to system building, explain why you may sometimes

have to maintain obsolete computers on which large software systems

were developed.

25.7. A common problem with system building occurs when physical

filenames are incorporated in system code and the file structure implied in

these names differs from that of the target machine. Write a set of

programmer’s guidelines that helps avoid this and any other system-

building problems that you can think of.

25.8. What are the factors that influence the decision on whether or not a

change should be implemented?

25.9. Describe six essential features that should be included in a tool to

support change management processes.

25.10. Explain why preparing and distributing a system release for mass-

market products is an expensive process.

R e F e R e n C e S

Aiello, B., and L. Sachs. 2011. Configuration Management Best Practices.

Boston: Addison-Wesley.

Bamford, R., and W. J. Deibler. 2003. “ISO 9001:2000 for Software and

Systems Providers: An Engineering Approach.” Boca Raton, FL: CRC Press.

Chrissis, M. B., M. Konrad, and S. Shrum. 2011. CMMI for Development:

Guidelines for Process Integration and Product Improvement, 3rd ed. Boston:

Addison-Wesley.

IEEE. 2012. “IEEE Standard for Configuration Management in Systems and

Software Engineering”

(IEEE Std 828-2012).” doi:10.1109/IEEESTD.2012.6170935.

Loeliger, J., and M. McCullough. 2012. Version Control with Git: Powerful

Tools and Techniques for Collaborative Software Development. Sebastopol,

CA: O’Reilly and Associates.

Loope, J. 2011. Managing Infrastructure with Puppet. Sebastopol, CA:

O’Reilly and Associates.

Pilato, C., B. Collins-Sussman, and B. Fitzpatrick. 2008. Version Control

with Subversion.

Sebastopol, CA: O’Reilly and Associates.

Smart, J. F. 2011. Jenkins: The Definitive Guide. Sebastopol, CA: O’Reilly

and Associates.

This page intentionally left blank

Glossary

abstract data type

A type that is defined by its operations rather than its representation. The

representation is private and may only be accessed by the defined

operations.

acceptance testing

Customer tests of a system to decide if it is adequate to meet their needs

and so

should be accepted from a supplier.

activity chart

A chart used by project managers to show the dependencies between tasks

that have

to be completed. The chart shows the tasks, the time expected to complete

these tasks and the task dependencies. The critical path is the longest path

(in terms of the time required to complete the tasks) through the activity

chart. The critical path defines the minimum time required to complete

the project. Sometimes called a PERT chart.

Ada

A programming language that was developed for the US Department of

Defense in

the 1980s as a standard language for developing military software. It is

based on

programming language research from the 1970s and includes constructs

such as

abstract data types and support for concurrency. It is still used for large,

complex military and aerospace systems.

agile manifesto

A set of principles encapsulating the ideas underlying agile methods of

software

development.

agile methods

Methods of software development that are geared to rapid software

delivery. The

software is developed and delivered in increments, and process

documentation and

758 Glossary

bureaucracy are minimized. The focus of development is on the code

itself, rather

than supporting documents.

algorithmic cost modeling

An approach to software cost estimation where a formula is used to

estimate the project cost. The parameters in the formula are attributes of

the project and the software itself.

application family

A set of software application programs that have a common architecture

and

generic functionality. These can be tailored to the needs of specific

customers by

modifying components and program parameters.

application framework

A set of reusable concrete and abstract classes that implement features

common to

many applications in a domain (e.g. user interfaces). The classes in the

application framework are specialized and instantiated to create an

application.

application program interface (API)

An interface, generally specified as a set of operations, that allows access

to an

application program’s functionality. This means that this functionality can

be called on directly by other programs and not just accessed through the

user interface.

architectural pattern (style)

An abstract description of a software architecture that has been tried and

tested in a number of different software systems. The pattern description

includes information

about where it is appropriate to use the pattern and the organization of

the compo-

nents of the architecture.

architectural view

A description of a software architecture from a particular perspective.

availability

The readiness of a system to deliver services when requested. Availability

is usu-

ally expressed as a decimal number, so an availability of 0.999 means that

the sys-

tem can deliver services for 999 out of 1000 time units.

B

A formal method of software development that is based on implementing

a system

by systematic transformation of a formal system specification.

bar chart (Gantt chart)

A chart used by project managers to show the project tasks, the schedule

associated

with these tasks and the people who will work on them. It shows the tasks’

start and end dates and the staff allocations against a timeline.

black-box testing

An approach to testing where the testers have no access to the source code

of a

system or its components. The tests are derived from the system

specification.

Glossary 759

BPMN

Business Process Modeling Notation. A notation for defining workflows

that

describe business processes and service composition.

brownfield software development

The development of software for an environment where there are several

existing

systems that the software being developed must integrate with.

C

A programming language that was originally developed to implement the

Unix sys-

tem. C is a relatively low-level system implementation language that

allows access

to the system hardware and which can be compiled to efficient code. It is

widely

used for low-level systems programming and embedded systems

development.

C++

An object-oriented programming language that is a superset of C.

C#

An object-oriented programming language, developed by Microsoft, that

has much

in common with C++, but which includes features that allow more

compile-time

type checking.

Capability Maturity Model (CMM)

The Software Engineering Institute’s Capability Maturity Model, which is

used to

assess the level of software development maturity in an organization. It

has now

been superseded by CMMI, but is still widely used.

Computer-Aided Software Engineering (CASE)

The term that was invented in the 1980s to describe process of developing

software

using automated tool support. Virtually all software development is now

reliant on

tool support so the term ’CASE is no longer widely used.

CASE tool

A software tool, such as a design editor or a program debugger, used to

support an

activity in the software development process.

CASE workbench

An integrated set of CASE tools that work together to support a major

process

activity such as software design or configuration management. Now often

called a

programming environment.

change management

A process to record, check, analyze, estimate and implement proposed

changes to a

software system.

class diagram

A UML diagram types that shows the object classes in a system and their

relationships.

760 Glossary

client–server architecture

An architectural model for distributed systems where the system

functionality is

offered as a set of services provided by a server. These are accessed by

client com-

puters that make use of the services. Variants of this approach, such as

three-tier

client–server architectures, use multiple servers.

cloud computing

The provision of computing and/or application services over the Internet

using a

‘cloud’ of servers from an external provider. The ‘cloud’ is implemented

using a

large number of commodity computers and virtualization technology to

make

effective use of these systems.

CMMI

An integrated approach to process capability maturity modeling based on

the adop-

tion of good software engineering practice and integrated quality

management. It

supports discrete and continuous maturity modeling and integrates

systems and

software engineering process maturity models. Developed from the

original Capa-

bility Maturity Model.

COCOMO II

See Constructive Cost Modeling.

code of ethics and professional practice

A set of guidelines that set out expected ethical and professional behavior

for

software engineers. This was defined by the major US professional

societies (the

ACM and the IEEE) and defines ethical behavior under eight headings:

public,

client and employer, product, judgment, management, colleagues,

profession

and self.

Common Request Broker Architecture (CORBA)

A set of standards proposed by the Object Management Group (OMG) that

defines

distributed component models and communications. Influential in the

development

of distributed systems but no longer widely used.

component

A deployable, independent unit of software that is completely defined and

accessed

through a set of interfaces.

component model

A set of standards for component implementation, documentation and

deploy-

ment. These cover the specific interfaces that may be provided by a

compo-

nent, component naming, component interoperation and component

composition. Component models provide the basis for middleware to

support

executing components.

component-based software engineering (CBSE)

The development of software by composing independent, deployable

software

components that are consistent with a component model.

Glossary 761

conceptual design

The development of a high-level vision of a complex system and a

description of

its essential capabilities. Designed to be understood by people who are not

systems engineers.

configurable application system

An application system product, developed by a system vendor, that offers

function-

ality that may be configured for use in different companies and

environments.

configuration item

A machine-readable unit, such as a document or a source code file, that is

subject to change and where the change has to be controlled by a

configuration management system.

configuration management

The process of managing the changes to an evolving software product.

Configura-

tion management involves version management, system building, change

manage-

ment and release management.

Constructive Cost Modeling (COCOMO)

A family of algorithmic cost estimation models. COCOMO was first

proposed in

the early-1980s and has been modified and updated since then to reflect

new tech-

nology and changing software engineering practice. COCOMO II is its

latest

instantiation and is a freely available algorithmic cost estimation model

that is supported by open source software tools.

CORBA

See Common Request Broker Architecture.

control metric

A software metric that allows managers to make planning decisions based

on infor-

mation about the software process or the software product that is being

developed.

Most control metrics are process metrics.

critical system

A computer system whose failure can result in significant economic,

human or

environmental losses.

COTS system

A Commercial Off-the-Shelf system. The term COTS is now mostly used in

military systems. See configurable application system.

CVS

A widely used, open-source software tool used for version management.

data processing system

A system that aims to process large amounts of structured data. These

systems usually process the data in batches and follow an input-process-

output model. Examples of

data processing systems are billing and invoicing systems, and payment

systems.

762 Glossary

denial of service attack

An attack on a web-based software system that attempts to overload the

system so

that it cannot provide its normal service to users.

dependability

The dependability of a system is an aggregate property that takes into

account the

system’s safety, reliability, availability, security, resilience and other

attributes. The dependability of a system reflects the extent to which it can

be trusted by its users.

dependability requirement

A system requirement that is included to help achieve the required

dependability for a system. Non-functional dependability requirements

specify dependability attribute

values; functional dependability requirements are functional requirements

that

specify how to avoid, detect, tolerate or recover from system faults and

failures.

dependability case

A structured document that is used to back up claims made by a system

developer

about the dependability of a system. Specific types of dependability case

are safety cases and security cases.

design pattern

A well-tried solution to a common problem that captures experience and

good prac-

tice in a form that can be reused. It is an abstract representation than can

be instantiated in a number of ways.

digital learning environment

An integrated set of software tools, educational applications and content

that is

geared to support learning.

distributed system

A software system where the software sub-systems or components execute

on

different processors.

domain

A specific problem or business area where software systems are used.

Examples of

domains include real-time control, business data processing and

telecommunica-

tions switching.

domain model

A definition of domain abstractions, such as policies, procedures, objects,

relationships and events. It serves as a base of knowledge about some

problem area.

DSDM

Dynamic System Development Method. Claimed to be one of the first agile

devel-

opment methods.

embedded system

A software system that is embedded in a hardware device e.g. the software

system

in a cell phone. Embedded systems are usually real-time systems and so

have to

respond in a timely way to events occurring in their environment.

Glossary 763

emergent property

A property that only becomes apparent once all of the components of the

system

have been integrated to create the system.

Enterprise Java Beans (EJB)

A Java-based component model.

enterprise resource planning (ERP) system

A large-scale software system that includes a range of capabilities to

support the

operation of business enterprises and which provides a means of sharing

informa-

tion across these capabilities. For example, an ERP system may include

support for

supply chain management, manufacturing and distribution. ERP systems

are con-

figured to the requirements of each company using the system.

ethnography

An observational technique that may be used in requirements elicitation

and analy-

sis. The ethnographer immerses him or herself in the users’ environment

and

observes their day-to-day work habits. Requirements for software support

can be

inferred from these observations.

event-based systems

Systems where the control of operation is determined by events that are

generated

in the system’s environment. Most real-time systems are event-based

systems.

extreme programming (XP)

A widely-used agile method of software development that includes

practices such

as scenario-based requirements, test-first development and pair

programming.

fault avoidance

Developing software in such a way that faults are not introduced into that

software.

fault detection

The use of processes and run-time checking to detect and remove faults in

a

program before these result in a system failure.

fault tolerance

The ability of a system to continue in execution even after faults have

occurred.

fault-tolerant architectures

System architectures that are designed to allow recovery from software

faults.

These are based on redundant and diverse software components.

formal methods

Methods of software development where the software is modeled using

formal math-

ematical constructs such as predicates and sets. Formal transformation

converts this model to code. Mostly used in the specification and

development of critical systems.

Gantt chart

See bar chart.

764 Glossary

Git

A distributed version management and system building tool where

developers take

complete copies of the project repository to allow concurrent working.

GitHub

A server that maintains a large number of Git repositories. Repositories

may be private or public. The repositories for many open-source projects

are maintained on GitHub.

hazard

A condition or state in a system that has the potential to cause or

contribute to

an accident.

host-target development

A mode of software development where the software is developed on a

separate

computer from where it is executed. The normal approach to development

for

embedded and mobile systems.

iLearn system

A digital learning environment to support learning in schools. Used as a

case study

in this book.

incremental development

An approach to software development where the software is delivered and

deployed

in increments.

information hiding

Using programming language constructs to conceal the representation of

data struc-

tures and to control external access to these structures.

inspection

See program inspection.

insulin pump

A software-controlled medical device that can deliver controlled doses of

insulin to people suffering from diabetes. Used as a case study in this

book.

integrated application system

An application system that is created by integrating two or more

configurable

application systems or legacy systems.

interface

A specification of the attributes and operations associated with a software

compo-

nent. The interface is used as the means of accessing the component’s

functionality.

ISO 9000/9001

A set of standards for quality management processes that is defined by the

Interna-

tional Standards Organization (ISO). ISO 9001 is the ISO standard that is

most

applicable to software development. These may be used to certify the

quality

management processes in an organization.

Glossary 765

iterative development

An approach to software development where the processes of

specification, design,

programming and testing are interleaved.

J2EE

Java 2 Platform Enterprise Edition. A complex middleware system that

supports

the development of component-based web applications in Java. It includes

a

component model for Java components, APIs, services, etc.

Java

A widely used object-oriented programming language that was designed

by Sun

(now Oracle) with the aim of platform independence.

language processing system

A system that translates one language into another. For example, a

compiler is a

language-processing system that translates program source code to object

code.

legacy system

A socio-technical system that is useful or essential to an organization but

which has been developed using obsolete technology or methods. Because

legacy systems

often perform critical business functions, they have to be maintained.

Lehman’s Laws

A set of hypotheses about the factors that influence the evolution of

complex

software systems.

maintenance

The process of making changes to a system after it has been put into

operation.

mean time to failure (MTTF)

The average time between observed system failures. Used in reliability

specification.

Mentcare system

Mental Health Care Patient Management System. This is a system used to

record

information about consultations and treatments prescribed for people

suffering

from mental health problems. Used as a case study in this book.

middleware

The infrastructure software in a distributed system. It helps manage

interactions

between the distributed entities in the system and the system databases.

Examples

of middleware are an object request broker and a transaction management

system.

misuse case

A description of a possible attack on a system that is associated with a

system use case.

model-driven architecture (MDA)

An approach to software development based on the construction of a set of

system

models, which can be automatically or semi-automatically processed to

generate an

executable system.

766 Glossary

model checking

A method of static verification where a state model of a system is

exhaustively ana-

lyzed in an attempt to discover unreachable states.

model-driven development (MDD)

An approach to software engineering centered around system models that

are expressed in the UML, rather than programming language code. This

extends MDA to consider

activities other than development such as requirements engineering and

testing.

multi-tenant databases

Databases where information from several different organizations is stored

in the

same database. Used in the implementation of software as a service.

mutual exclusion

A mechanism to ensure that a concurrent process maintains control of

memory until

updates or accesses have been completed.

.NET

A very extensive framework used to develop applications for Microsoft

Windows

systems. Includes a component model that defines standards for

components in

Windows systems and associated middleware to support component

execution.

object class

An object class defines the attributes and operations of objects. Objects are

created at run-time by instantiating the class definition. The object class

name can be used as a type name in some object-oriented languages.

object model

A model of a software system that is structured and organized as a set of

object

classes and the relationships between these classes. Various different

perspectives

on the model may exist such as a state perspective and a sequence

perspective.

object-oriented (OO) development

An approach to software development where the fundamental abstractions

in the

system are independent objects. The same type of abstraction is used

during

specification, design and development.

object constraint language (OCL)

A language that is part of the UML, used to define predicates that apply to

object

classes and interactions in a UML model. The use of the OCL to specify

compo-

nents is a fundamental part of model-driven development.

Object Management Group (OMG)

A group of companies formed to develop standards for object-oriented

develop-

ment. Examples of standards promoted by the OMG are CORBA, UML and

MDA.

open source

An approach to software development where the source code for a system

is made public and external users are encouraged to participate in the

development of the system.

Glossary 767

operational profile

A set of artificial system inputs that reflect the pattern of inputs that are

processed in an operational system. Used in reliability testing.

pair programming

A development situation where programmers work in pairs, rather than

individually,

to develop code. A fundamental part of extreme programming.

peer-to-peer system

A distributed system where there is no distinction between clients and

servers.

Computers in the system can act as both clients and servers. Peer-to-peer

applications include file sharing, instant messaging and cooperation

support systems.

People Capability Maturity Model (P-CMM)

A process maturity model that reflects how effective an organization is at

managing

the skills, training and experience of the people in that organization.

plan-driven process

A software process where all of the process activities are planned before

the soft-

ware is developed.

planning game

An approach to project planning based on estimating the time required to

imple-

ment user stories. Used in some agile methods.

predictor metric

A software metric that is used as a basis for making predictions about the

characteristics of a software system, such as its reliability or

maintainability.

probability of failure on demand (POFOD)

A reliability metric that is based on the likelihood of a software system

failing

when a demand for its services is made.

process improvement

Changing a software development process with the aim of making that

process

more efficient or improving the quality of its outputs. For example, if your

aim is

to reduce the number of defects in the delivered software, you might

improve a

process by adding new validation activities.

process model

An abstract representation of a process. Process models may be developed

from

various perspectives and can show the activities involved in a process, the

artifacts used in the process, constraints that apply to the process, and the

roles of the people enacting the process.

process maturity model

A model of the extent to which a process includes good practice and

reflective and

measurement capabilities that are geared to process improvement.

768 Glossary

program evolution dynamics

The study of the ways in which an evolving software system changes. It is

claimed

that Lehman’s Laws govern the dynamics of program evolution.

program generator

A program that generates another program from a high-level, abstract

specification.

The generator embeds knowledge that is reused in each generation

activity.

program inspection

A review where a group of inspectors examine a program, line by line,

with the aim

of detecting program errors. A checklist of common programming errors

often

drives inspections.

Python

A programming language with dynamic types, which is particularly well-

suited to

the development of web-based systems.

quality management (QM)

The set of processes concerned with defining how software quality can be

achieved

and how the organization developing the software knows that the software

has met

the required level of quality.

quality plan

A plan that defines the quality processes and procedures that should be

used. This

involves selecting and instantiating standards for products and processes

and defin-

ing the system quality attributes that are most important.

rapid application development (RAD)

An approach to software development aimed at rapid delivery of the

software. It

often involves the use of database programming and development support

tools

such as screen and report generators.

rate of occurrence of failure (ROCOF)

A reliability metric that is based on the number of observed failures of a

system in a given time period.

Rational Unified Process (RUP)

A generic software process model that presents software development as a

four-

phase iterative activity, where the phases are inception, elaboration,

construction

and transition. Inception establishes a business case for the system,

elaboration

defines the architecture, construction implements the system, and

transition deploys the system in the customer’s environment.

real-time system

A system that has to recognize and process external events in ’real-time’.

The

correctness of the system does not just depend on what it does but also on

how quickly it does it. Real-time systems are usually organized as a set of

concurrent processes.

Glossary 769

reductionism

An engineering approach that relies on breaking down a problem to sub-

problems,

solving these sub-problems independently then integrating these solutions

to create

the solution to the larger problem.

reengineering

The modification of a software system to make it easier to understand and

change.

Reengineering often involves software and data restructuring and

organization,

program simplification and redocumentation.

reengineering, business process

Changing a business process to meet a new organizational objective such

as

reduced cost and faster execution.

refactoring

Modifying a program to improve its structure and readability without

changing its

functionality.

reference architecture

A generic, idealized architecture that includes all the features that systems

might

incorporate. It is a way of informing designers about the general structure

of that

class of system rather than a basis for creating a specific system

architecture.

release

A version of a software system that is made available to system customers.

reliability

The ability of a system to deliver services as specified. Reliability can be

specified quantitatively as a probability of failure on demand or as the

rate of occurrence

of failure.

reliability growth modeling

The development of a model of how the reliability of a system changes

(improves)

as it is tested and program defects are removed.

requirement, functional

A statement of some function or feature that should be implemented in a

system.

requirement, non-functional

A statement of a constraint or expected behavior that applies to a system.

This

constraint may refer to the emergent properties of the software that is

being

developed or to the development process.

requirements management

The process of managing changes to requirements to ensure that the

changes made

are properly analyzed and tracked through the system.

resilience

A judgement of how well a system can maintain the continuity of its

critical services in the presence of disruptive events, such as equipment

failure and cyberattacks.

770 Glossary

REST

REST (Representational State Transfer) is a style of development based

around

simple client/server interaction which uses the HTTP protocol for

communications.

REST is based around the idea of an identifiable resource, which has a

URI. All

interaction with resources is based on HTTP POST, GET, PUT and

DELETE.

Widely used for implementing low overhead web services (RESTful

services).

revision control systems

See version control systems.

risk

An undesirable outcome that poses a threat to the achievement of some

objective. A

process risk threatens the schedule or cost of a process; a product risk is a

risk that may mean that some of the system requirements may not be

achieved. A safety risk

is a measure of the probability that a hazard will lead to an accident.

risk management

The process of identifying risks, assessing their severity, planning

measures to

put in place if the risks arise and monitoring the software and the software

process for risks.

Ruby

A programming language with dynamic types that is particularly well-

suited to web

application programming.

SaaS

See software as a service.

safety

The ability of a system to operate without behavior that may injure or kill

people or damage the system’s environment.

safety case

A body of evidence and structured argument from that evidence that a

system is

safe and/or secure. Many critical systems must have associated safety

cases that

are assessed and approved by external regulators before the system is

certified

for use.

SAP

A German company that has developed a well-known and widely-used

ERP

system. It also refers to the name given to the ERP system itself.

scenario

A description of one typical way in which a system is used or a user

carries out

some activity.

scenario testing

An approach to software testing where test cases are derived from a

scenario of

system use.

Glossary 771

Scrum

An agile method of development, which is based on sprints – short

development,

cycles. Scrum may be used as a basis for agile project management

alongside other

agile methods such as XP.

security

The ability of a system to protect itself against accidental or deliberate

intrusion.

Security includes confidentiality, integrity and availability.

SEI

Software Engineering Institute. A software engineering research and

technology

transfer center, founded with the aim of improving the standard of

software

engineering in US companies.

sequence diagram

A diagram that shows the sequence of interactions required to complete

some

operation. In the UML, sequence diagrams may be associated with use

cases.

server

A program that provides a service to other (client) programs.

service

See web service.

socio-technical system

A system, including hardware and software components, that has defined

operational

processes followed by human operators and which operates within an

organization.

It is therefore influenced by organizational policies, procedures and

structures.

software analytics

Automated analysis of static and dynamic data about software systems to

discover

relationships between these data. These relationships may provide insights

about

possible ways to improve the quality of the software.

software architecture

A model of the fundamental structure and organization of a software

system.

software as a service (SaaS)

Software applications that are accessed remotely through a web browser

rather than

installed on local computers. Increasingly used to deliver application

services to

end-users.

software development life cycle

Often used as another name for the software process. Originally coined to

refer to

the waterfall model of the software process.

software metric

An attribute of a software system or process that can be expressed

numerically and

measured. Process metrics are attributes of the process such as the time

taken to

complete a task; product metrics are attributes of the software itself such

as size

or complexity.

772 Glossary

software process

The activities and processes that are involved in developing and evolving

a soft-

ware system.

software product line

See application family.

spiral model

A model of a development process where the process is represented as a

spiral,

with each round of the spiral incorporating the different stages in the

process. As

you move from one round of the spiral to another, you repeat all of the

stages of

the process.

state diagram

A UML diagram type that shows the states of a system and the events that

trigger a

transition from one state to another.

static analysis

Tool-based analysis of a program’s source code to discover errors and

anomalies.

Anomalies, such as successive assignments to a variable with no

intermediate use

may be indicators of programming errors.

structured method

A method of software design that defines the system models that should be

devel-

oped, the rules and guidelines that should apply to these models and a

process to be followed in developing the design.

Structured Query Language (SQL)

A standard language used for relational database programming.

Subversion

A widely-used, open source version control and system building tool that

is avail-

able on a range of platforms.

Swiss cheese model

A model of system defenses against operator failure or cyberattack that

takes vul-

nerabilities in these defenses into account.

system

A system is a purposeful collection of interrelated components, of different

kinds,

which work together to deliver a set of services to the system owner and

users.

system building

The process of compiling the components or units that make up a system

and link-

ing these with other components to create an executable program. System

building

is normally automated so that recompilation is minimized. This

automation may be

built in to the language processing system (as in Java) or may involve

software

tools to support system building.

systems engineering

A process that is concerned with specifying a system, integrating its

components

and testing that the system meets its requirements. System engineering is

concerned

Glossary 773

with the whole socio-technical system—software, hardware and

operational

processes—not just the system software.

system of systems

A system that is created by integrating two or more existing systems.

system testing

The testing of a completed system before it is delivered to customers.

test coverage

The effectiveness of system tests in testing the code of an entire system.

Some

companies have standards for test coverage e.g. the system tests shall

ensure that all program statements are executed at least once.

test-driven development

An approach to software development where executable tests are written

before

the program code. The set of tests are run automatically after every

change to

the program.

TOGAF

An architectural framework, supported by the Object Management Group,

that

is intended to support the development of enterprise architectures for

systems

of systems.

transaction

A unit of interaction with a computer system. Transactions are

independent and

atomic (they are not broken down into smaller units) and are a

fundamental unit of

recovery, consistency and concurrency.

transaction processing system

A system that ensures that transactions are processed in such a way so that

they do

not interfere with each other and so that individual transaction failure

does not

affect other transactions or the system’s data.

Unified Modeling Language (UML)

A graphical language used in object-oriented development that includes

several

types of system model that provide different views of a system. The UML

has

become a de facto standard for object-oriented modeling.

unit testing

The testing of individual program units by the software developer or

development team.

use case

A specification of one type of interaction with a system.

use-case diagram

A UML diagram type that is used to identify use-cases and graphically

depict the

users involved. It must be supplemented with additional information to

completely

describe use-cases.

774 Glossary

user interface design

The process of designing the way in which system users can access system

functionality, and the way that information produced by the system is

displayed.

user story

A natural language description of a situation that explains how a system

or systems

might be used and the interactions with the systems that might take place.

validation

The process of checking that a system meets the needs and expectations of

the customer.

verification

The process of checking that a system meets its specification.

version control

The process of managing changes to a software system and its components

so

that it is possible to know which changes have been implemented in each

version

of the component/system, and also to recover/recreate previous versions

of the

component/system.

version control (VC) systems

Software tools that have been developed to support the processes of

version control.

These may be based on either centralized or distributed repositories.

waterfall model

A software process model that involves discrete development stages:

specification,

design, implementation, testing and maintenance. In principle, one stage

must be

complete before progress to the next stage is possible. In practice, there is

significant iteration between stages.

web service

An independent software component that can be accessed through the

Internet

using standard protocols. It is completely self-contained without external

dependencies. XML-based standards such as SOAP (Standard Object Access

Protocol), for web service information exchange, and WSDL (Web Service

Definition Language), for the definition of web service interfaces, have

been

developed. However, the REST approach may also be used for web service

implementation.

white-box testing

An approach to program testing where the tests are based on knowledge of

the

structure of the program and its components. Access to source code is

essential for

white-box testing.

wicked problem

A problem that cannot be completely specified or understood because of

the

complexity of the interactions between the elements that contribute to the

problem.

Glossary 775

wilderness weather system

A system to collect data about the weather conditions in remote areas.

Used as a

case study in this book.

workflow

A detailed definition of a business process that is intended to accomplish a

certain task. The workflow is usually expressed graphically and shows the

individual process activities and the information that is produced and

consumed

by each activity.

WSDL

An XML-based notation for defining the interface of web services.

XML

Extended Markup Language. XML is a text markup language that supports

the

interchange of structured data. Each data field is delimited by tags that

give

information about that field. XML is now very widely used and has

become the

basis of protocols for web services.

XP

See Extreme Programming.

Z

A model-based, formal specification language developed at the University

of

Oxford in England.

This page intentionally left blank

Subject Index

A

continuous integration, 742–43

custom systems and, 90, 732

customer involvement and, 76, 77, 91, 748, 750

abstraction level (reuse), 213

development team, 85, 90, 92–93

acceptability, 22, 347–48

documentation and, 73–75, 86, 89–90, 92–93, 175

acceptance testing, 77, 82, 249, 250–51, 252

evolution and, 90, 261

accidents (mishaps), 343–44, 347

extreme programming (XP), 73, 77–84

ACM/IEEE-CS Joint Task Force on Software

incremental development and, 45, 50, 73–74, 77

Engineering Ethics and Professional Practices,

large system complexity and, 93–96

29–30

manifesto, 75–76, 77–78

acquisition (procurement), 473, 553–54, 566–70

model-driven architecture (MDA) and, 162

activities (software engineering activities), 20, 23,

organizations and, 91, 97

44, 47–48, 54–61, 142, 298, 643–44. See also

pair programming, 78, 83–84

development; evolution; specification;

‘people, not process’ and, 76, 77, 91

validation

plan-driven approach v., 45, 74–75, 91–93, 98

activity charts (planning), 678–80

principles of, 76

activity diagrams (UML), 33–34, 47, 50, 56, 141,

process improvement and, 66

143–44, 163

project management and, 84–88, 643, 647, 661

actuators, 218, 502, 613–14, 615

project planning, 91–93, 670, 680–83, 696

Ada programming language, 359

quality management (QM), 714–16, 727

adaptors, 469, 482–83

refactoring, 51, 80–81

additive composition, 481

risk management and, 647

Adobe Creative Suite, 27

scaling, 88–97, 98

aggregation, 153

simplicity of, 76, 78, 91

agile methods, 45, 66, 72–100

Scrum approach and, 73, 78, 85–88, 96

architectural design and, 168, 175

test first development, 59, 78, 81–83

change and, 76, 78, 91, 131–32

user stories for, 681–82

change management and, 97, 748, 750

user testing, 251

configuration management (CM) for, 732, 742–43,

agile modeling, 50

748, 750

Agile Scaling Model (ASM), 95

critical systems and, 75, 92, 96

air traffic management (ATC) systems, 554–55, 569

778 Subject Index

Airbus 340 flight control system, 321–22, 340

distributed component systems, 501, 506–09, 517

AJAX programming, 28, 445, 512

distributed systems, 175–84, 192, 501–12, 517

algorithm error, 351–52

embedded software and, 620–26, 634

algorithmic cost modeling, 683, 684–86

environmental control, 620, 623–25

alpha testing, 249

layered architecture, 177–79

analysis systems, 25

master-slave architecture, 501–02

Android, 219

model-view-controller (MVC), 176–77

Apache web server, 219

multi-tier client-server architecture, 501, 505–06

aperiodic stimuli, 613

observe and react, 620, 621–23

Apollo 13 mission resilience, 409, 411, 416

peer-to-peer (p2p) architecture, 501, 509–12, 517

application assessment (legacy systems), 269

pipe and filter architecture, 182–84

application data, 262

process pipeline, 620, 625–26

application frameworks, 442, 443–46, 460

real-time software, 620–26, 634

application layer, 292

repository architecture, 179–80

application-level protection, 393–394

security and, 172, 388, 392–95

application programming interfaces (APIs), 39,

systems of systems (SoS), 602–606, 607

595–96

trading systems, 605–06

application security, 374–375

two-tier client-server architecture, 501, 503–05

application software, 262

Architecture Development Method (ADM), 601

application system, 53, 438, 453–60

architectures (software architectures)

COTS systems, 453

application, 184–91, 192

ERP systems, 454–457

architecture in the large, 169

reuse, 438, 442, 453–60

architecture in the small, 169

architectural description languages

defined, 192

(ADLs), 175

distributed, 171, 182

architectural design, 57, 149, 167–195, 570–71, 595,

fault-tolerant, 318–25

599–606

industrial practice v., 170

block diagrams for, 170

pipe and filter compiler, 190–91

Booch’s architecture catalog and, 170

reference, 191

decisions, 171–73, 192

self-monitoring, 320–22

4+1 view model, 173–74

Ariane 5 explosion, 296, 479, 480

levels of abstraction, 169

arithmetic error, 351

maintenance and, 172–73, 178

as low as reasonably practical (ALARP) risks, 347

model-driven architecture (MDA), 159–62

aspect-oriented software development, 442

non-functional requirements for, 169, 172–73

Assertion checking, 360

object-oriented systems, 201–02

assessment

patterns, 175–84, 192

hazards for safety requirements, 345, 346–349

refactoring and, 168

security risk, 381–82

security and, 172, 388, 392–395

assets, 377, 378, 413, 414–415

structural models for, 149

assurance

system development and, 570–71

safety processes, 353–56

systems of systems (SoS), 595, 599–606

security testing and, 402–04

views, 173–75, 192

ATMs (automated teller machines), 186–87, 315–16

architectural frameworks, 600–02

attacks, 377, 378–79, 389, 413, 414–15, 494–95

architectural patterns (styles), 172

attributes of software, 20, 22, 40

client-server architecture, 180–82, 501,

authentication, 413, 414, 416

503–06, 517

automated management, 423–24

container systems, 603–05

automated testing, 78, 81–83, 233–34, 242, 252

data-feed systems, 602–03

automatic static analysis, 359–60

Subject Index 779

availability

modeling workflow, 67–68

security and, 374, 375, 413

open-source software and, 221

system availability, 172, 288, 309–12

policies (rules), 262

availability metric (AVAIL), 313–314

process maturity models, 67–68

avoidance

process reengineering, 276–78

error discovery and, 300–01

processes, 262

fault, 308

rapid software development and, 73–74

hazard, 342, 351

requirements changes, 131

strategies (risk management), 650

resilience and, 426–27

vulnerability, 378

security and, 380–382

services, 534, 541–47, 548

social change and, 24

software systems, 24, 27, 45, 68, 267–68

system construction by composition, 543–44

system values, 267–68, 280

B

web-based applications, 27

workflow, 542, 543, 544–46

B method, 49, 300, 301, 357

banking system, Internet, 505

baselines, 734, 735, 736

batch processing systems, 25

behavioral models, 154–59, 163

C

beta testing, 58, 60, 249–250

bidding (projects), 669, 671–72

bindings, 527–28

C and C++ programming languages, 197, 327, 330,

blackboard model, 180

359, 360, 401, 444, 619

block diagrams, 170, 199

callbacks, 445

Boehm’s spiral process model, 48

catalog interface design, 537–538

Booch’s software architecture catalog, 170

centralized systems, version management of,

boundaries (system models), 141–42, 163,

735, 737

199, 556–57

certification (software dependability), 294, 299, 302,

branching, 734, 739

354, 355–56, 474, 477, 709–10

broadcast (listener) models, 202

change, 61–65. See also process change

Brownfield systems, 94, 256

agile methods and, 73–74, 78, 90–91, 97

BSD (Berkeley Standard Distribution)

business and social needs, 24

license, 220

cost effectiveness of, 133

Bugzilla, 216

cultural (social), 24, 97

build system, 741–42

customers and, 748–49

burglar alarm system, 614, 622, 629–31

effects on software engineering, 27–28

business-critical system, 287

extreme programming (XP) and, 78

business process layer, 292

implementation, 134, 259–60, 280

Business Process Modeling Notation (BPMN),

incremental delivery, 62, 64–65

544–46

plan-driven process and, 73

business process models, 544–46

problem analysis and, 133

businesses

prototyping, 62–63

activity diagrams (UML) for processes, 143–44

rapid software development for, 73–74

interrelated 4 R’s approach, 426–27

requirements management for, 111, 130–34

legacy system evolution, 261–68

reuse, 27–28

maintenance costs, 274–76, 279

rework for, 61, 73

780 Subject Index

change anticipation, 61

completeness, 107, 129

change control board (CCB), 748–49

complexity, 18, 93–96, 274–75, 278, 584–87, 606

change management, 97, 731, 745–50, 753

governance, 586–87, 588–90, 606

agile methods and, 97, 748, 750

large systems, 93–96

change requests, 747–50

maintenance prediction and, 274–75

dependability and, 299

management, 585, 586–87, 587–90, 606

development environments and, 217

reductionism for systems, 590–93, 606

requirements and, 111, 130–34

refactoring, 278

change proposals, 90, 258–59

scaling agile methods and, 93–96

change request form (CRF), 747–48

system releases, 751–52

change tolerance, 61

systems of systems (SoS), 584–87, 606

characteristic error checking, 359–60

technical, 585, 586–87, 590

check array bounds, 330

compliance to software regulation, 294–95

checking requirements, 317

component-based software engineering (CBSE), 442,

checklists, 403, 713–714

464–489

checksums, 745

component certification, 474, 477

circular buffer, 616–17

component management, 474, 476

class diagrams, 141, 149–51, 163

development for reuse, 473, 474–77

class identification, 202–04

development with reuse, 473, 477–80

Cleanroom process, 230, 332

middleware and, 465, 472–73

client-server architecture, 180–82, 428, 501,

service-oriented software v., 466–67

503–06, 517

component level (reuse), 214

client-server systems, 499–501, 517

components (software), 52–53, 188, 190, 295, 424,

clouds, 25, 27, 532

465–73, 487, 526–29

COBOL code, 263

architectural design and, 172

COCOMO II modeling, 276, 476, 686–96

communications, 172, 218, 526–29

application composition model, 688–89

composition, 480–86, 487

cost drivers, 692

defined, 465, 467, 487

early design model, 689–90

deployment, 471, 472–73

post-architectural level, 692–94

design and selection of, 57, 424, 452

project duration and staffing, 694–96

external, 330–31

reuse model, 690–92

implementation, 465, 466, 471–72, 475, 487

code coverage, 243–44, 252

incompatibility, 481–83

code inspection and review, 83, 715

interfaces, 208–209, 237–239, 465, 468–69

Code of Ethics and Professional Practice (software

measurement (analysis), 722–23

engineering), 29–30

models, 470–73, 487

codelines, 734, 735, 736, 739

open-source, 220–21

collaborative systems, 588

platforms for, 466–67

collective ownership, 78

remote procedure calls (RPCs)

COM platform, 466

for, 470, 471

Common Intermediate Language (CIL),

reuse, 52–53, 212, 214, 221, 438–439,

470–71

452, 468, 487

communication

services v., 521

data management layer and, 292

service-oriented architectures (SOA), 526–29

message exchange, 496–97, 526–29, 537

testing, 59, 232, 237–239

stakeholder, 169

timeouts, 330–31

communication latency, 218

timing errors, 238–239

compartmentalization, 399

components (system), procurement (acquisition) of,

competence, 28

567–68

Subject Index 781

composition

CORBA (Common Object Request Broker

of components, 480–86, 487

Architecture), 466, 493, 507

service systems and, 541–47

cost/dependability curve, 290–91

computation independent model (CIM), 159–61

cost drivers, 692

computer science, software engineering v., 20, 23

costs. See also estimation techniques

concept reuse, 439

change analysis and, 133

conceptual system design, 553, 563–66, 577, 594

COCOMO II modeling, 686–96

conceptual views, 174, 192

dependability and, 290–91

concurrency, 491

distributed systems, 495

confidence levels (verification), 228–29

effort, 669

confidentiality, 28, 374, 413

fault removal, 308–09

configurable application systems, 442, 454–457

formal verification, 357

configuration management (CM), 213,

maintenance/development, 274–76, 279, 280

215–216, 222, 730–55. See also change

overhead, 669

management

project planning, 669

activities of, 215–16

safety engineering and, 357, 362–63

agile methods and, 732, 742–43, 748, 750

software engineering, 20

architectural patterns, 175

software reuse and, 214, 439

change management, 731, 745–50, 753

system failure, 286

design implementation and, 213, 215–16, 222

COTS (commercial-off-the-shelf) systems, 453. See

problem tracking, 216

also application system reuse

release management, 216, 731, 750–53, 754

critical systems, 287. See also safety-critical

system building, 731, 740–45, 753

systems

system integration and, 215–16

agile methods and, 75

terminology for, 734

dependable processes for, 297

version management (VM), 215, 216,

documentation for, 92, 96

731, 735–40, 753

failure of, 287, 303

configuration, software product lines, 451–52

formal methods for dependability of, 302

ConOps document standard, 563

redundancy and, 295

consistency, 107, 129, 652

types of, 287, 424

constants, naming of, 331

verification and validation costs, 290

construction phase (RUP), 46

cultural change, 97

consumer/producer processes (circular buffer),

customer involvement (agile methods), 76, 77, 91,

616–17

748, 750

container systems, 603–05

customer testing, 59

context models, 141–44, 163, 199–200

customization, 471, 732–33

contingency plans, 650–51

cybersecurity, 376, 412–416, 432

continuous integration, 78, 742–43

control

application frameworks and, 445

cybersecurity, 413–414

D

inversion of, 445

safety-critical systems, 341–42

security, 377, 378–79

damage limitation, 342, 351

visibility of information, 325–26

data clumping, 279

control metrics, 717

data collection systems, 25, 202

controlled systems, 319

data flow diagrams (DFD), 154–55

cooperative interaction patterns, 175

data reengineering, 277

coordination services, 534, 548

database design, 57

782 Subject Index

data-driven modeling, 154–55

implementation and, 47, 56–58, 69, 196–225

data-feed systems, 602–03

interface, 57, 208–09, 222

data-mining system, 508

life-cycle phase, 47

deadlines (real-time systems), 627

models, 123–208, 222

debugging, 58, 216, 232, 244

object-oriented, 198–209, 222

decentralized systems, 510–11, 517

open-source development, 219–21, 222

Decorator pattern, 212

patterns, 209–12

defect testing, 58, 227–28, 232

for recovery, 400–01

debugging v., 58, 232

for resilience, 424–32

performance, 248

reuse and, 57, 212, 213–15

release testing, 248

service interfaces, 533, 536–40

deltas (storage management), 740

test-case, 234–37

denial-of-service attacks, 289–90, 389, 423

UML documentation, 197, 198–209

Department of Defense Architecture Framework

user interface, 62

(DODAF), 601

design-time configuration, 451–52

dependability (software dependability), 26, 285–305

‘desk’ testing, 428

activities for, 298

development

assurance, 353–56, 402–04

agile techniques, 77–84, 88, 732

costs of, 290–91

customization stages, 732–33

critical systems, 287, 290, 297, 302

configuration management (CM)

design considerations, 287, 295

phases, 732–33

formal methods and, 299–302, 303

engineering design and programming, 23, 44

functionality v., 286

evolution and, 23, 60–61, 256–57, 280

properties, 288–91

implementation stage, 56–58

redundancy and diversity, 295–97, 303

maintenance costs, 274–76, 279

reliability and, 288–90, 297, 303

maintenance v., 60–61

safety and, 288, 299

pair programming, 83–84

security and, 22, 26, 288, 376–79

plan-driven process, 59–60, 570

sociotechnical systems, 291–95, 303

professional software, 19–28

specification and, 300–02

refactoring, 51, 62, 80–81

system, 268, 286–91, 303

regulators for safety, 353

dependable programming guidelines, 325–31

reuse and, 52–54

deployment

reuse for (CBSE process), 473, 474–77

component model, 471, 472–73

reuse with (CBSE process), 473, 477–80

design for, 399–400

safety cases, 362–63

service implementation and, 540–41

safety-critical systems, 352–53

system development and, 570

services and, 541–47, 548

systems of systems (SoS), 595, 597–99

sociotechnical systems, 291–295, 303

UML diagrams, 149, 218

software dependability and, 290

deployment-time configuration, 451–52

spiral model for, 256–57

derivation history, 750

system processes, 554, 570–74

design (software design), 44, 56–58, 69, 78,

testing, 58–60, 81–83, 230–32

196–225. See also architectural design;

development team, 85, 90, 92–93

system design

development testing, 231–42, 252

activity model (diagram), 56

development view, 174, 192

configuration management, 212, 215–16, 222

digital art, 566

for deployment, 399–400

digital learning environment (iLearn), 38–39

engineering programming and, 23, 44, 58

application programming interface (API), 39

guidelines, 396–401, 405

architecture (diagram), 38–39

Subject Index 783

elicitation of requirements, 118–20

domain-specific application systems, 438, 441, 446

layered architecture of, 179

duplicate code, 279

photo sharing story, 118–20

dynamic metrics, 720–21

services, 38–39

dynamic model, 199, 205, 206, 222

Virtual Learning Environment (VLE), 38

dynamic perspective (RUP), 46

directed systems, 588

dynamic systems development method (DSDM), 73

distributed architectures, 171

distributed component systems, 501, 506–09, 517

distributed development (Scrum), 88

distributed systems (software engineering), 490–519

E

advantages of, 491, 517

architectural design of, 171–72, 182

architectural patterns for, 175–84, 501–12, 517

e-commerce systems, 188–89

attack defense, 494–95

early design model, 689–90

client-server architecture, 180–82, 501,

Eclipse environment, 32, 216, 218, 219

503–06, 517

efficiency, 22, 422–23

client-server systems, 499–501, 517

effort cost, 669

CORBA (Common Object Request Broker

effort distribution, 272

Architecture), 493, 507

egoless programming, 83

design issues, 492–96, 517

elaboration phase (RUP), 46

interaction models, 496–97

elicit stakeholder requirements, 450

middleware, 498–99, 517

elicitation/analysis for requirements, 55,

openness, 491, 492, 493

112–20, 134

quality of service (QoS), 492, 495

embedded software systems, 25, 32, 634. See also

scalability, 491, 492, 494, 514, 515–16

real-time systems

software as service (SaS), 512–16, 517

architectural patterns and, 620–26, 634

version management of, 735, 737–39

design of, 217–18, 613–20

diversity (software diversity)

host-target development and, 217

application types, 24–25

real-time software engineering, 218, 610–37

dependability and, 295–97, 303

simulators for, 217

fault-tolerant architecture, 318, 322, 323–25

stimulus/response model, 613–14, 634

redundancy and, 318, 398

timing analysis, 626–31

reliability and, 318, 322, 323–25, 336

user testing, 251

risk reduction and, 398

emergency call log, 422–23

software engineering, 24–27

emergency repair process, 260–61

documentation, 19, 40, 49, 56, 73–75, 92–93, 273

emergent properties, 558, 559–61, 577

agile methods and, 73–75, 86, 89–90, 92–93, 126

encryption, 413

architectural design and, 175

enduring requirements, 132

certification and, 294, 299, 302

engineering, see software engineering; systems

change implementation, 260

engineering

maintenance and, 92, 273

Enterprise Java Beans (EJB), 446, 466, 470, 507

organization of, 127–28

Enterprise systems, 422, 552. See also

reader requirements, 103–04

ERP systems

safety cases, 361–67

entertainment systems, 25

software requirements (SRS), 126–29, 135

environment assessment (legacy systems), 269

standards, 129, 706

environmental adaptation, 271

system release, 741, 752–53

environmental control pattern, 620, 623–25

TDD and, 244

environmental specialization (software product

user requirements, 73, 126–27

lines), 450

784 Subject Index

environments. See also IDEs

life cycle, 257–58, 266

architectural patterns and, 176

maintenance, 22, 60–61, 270–79, 280

business requirements changes, 131

processes, 258–61

context model for, 142–43

program evolution dynamics, 271

marketing, 229

refactoring and, 61, 78, 273,

software interaction and system failure, 293–94

278–79, 280

work, 663

requirements changes, 131

equipment layer, 292

servicing v., 257–58

equity trading system, 394–95

software lifetime and, 256–57

equivalence partitioning, 235–236

software reengineering, 273, 276–78

ERP (Enterprise Resource Planning) systems, 21,

spiral model of, 256–57

184, 438, 442, 454–457

system evolution v., 575–76

application frameworks, 446

exceptions

architecture of, 455–456

CBSE for reuse, 476–77

configurable application reuse, 454–457

handlers for, 327–28

customer adaptation of, 438

Executable UML (xUML), 162

system procurement and adaptation, 569

execution time (real-time systems), 627

error-prone constructs, 308, 328–29

experience-based estimation, 683–84

error tolerance, 289

experience-based testing, 403

errors

explicitly defined process, 297

algorithmic, 351–52

exposure, 377, 378, 379

arithmetic, 351

external components, 330–31

avoidance and discovery, 300–01

external requirements, 109

checking, 359–61

extreme programming (XP), 73, 77–84

correction, 48

acceptance testing and, 77, 82

failure and fault v., 308

agile methods and, 73, 77–79

human, 307, 351–52, 418–21

continuous integration and, 78, 96

safety engineering and, 359–61

pair programming, 78, 83–84

specification, 324–25

release cycle in, 77

static analysis for, 359–61

story cards, 79–80

system, 307–09

test first development, 78, 81–83, 242

timing, 238–39

user requirements, 73, 99

estimation techniques (project planning), 682–86,

696

algorithmic cost modeling, 683, 684–86

COCOMO II model, 686–96

F

experience-based techniques, 683–84

software productivity and, 686

ethical/professional responsibility, 28–31, 40

façade pattern, 211

ethnography technique, 116–18

failure propagation, 560–61

evaluation, prototype phase of, 63

failures, see also system failure

event-driven modeling, 156–58

definition v. judgment, 310

evolution (software evolution), 69, 255–82

error and fault v., 308

activity model (diagram), 61

hardware, 287

agile technique and, 261

human errors and, 307, 351–52, 418–21

business costs and, 274–76, 279

information loss, 286

development v., 60, 256–57, 280

operational, 287

engineer activities for, 20, 23, 44

safe state solutions to, 351–52

legacy systems, 261–70, 280

server safety v. privacy, 36

Subject Index 785

software, 18, 22, 26, 287, 308, 310,

G

340–41, 351–52

system failure costs, 286

fault (system faults), 307–09

‘Gang of Four,’ 209–12

avoidance, 308

General Public License (GPL), 220

costs of removal, 308–09

generalization of structural models, 152–53, 205

detection and correction, 308

generator-based reuse, 443

error and failure v., 308

Git system, 216, 737, 740

repair, 271

GitHub, 476, 478

tolerance, 308

‘glue code,’ 466, 481, 487

fault-tolerant architectures, 318–25, 491

GNU build system, 216

distributed systems, 491

GNU General Public License (GPL), 220

diversity of software, 323–25

Google Apps, 27

N-version programming, 322–23

Google Code, 478

protection systems, 319–20

governance complexity, SoS, 586–87,

self-monitoring, 320–22

588–90, 606

fault tree analysis, 349–51

graphical models, 140

feasibility studies, 54, 104

graphical notations, 121

Federal Aviation Administration, 92, 290

groups, see teamwork

federated systems, 589

growth modeling, 334

film library, client-server architecture for, 182

guideline-based testing, 234

firewalls, 413–14

guidelines

flight control software, 296, 321–22, 340, 341

hiring, 661

floating-point numbers, 329

dependable programming, 325–31

formal (mathematical) models, 139

system security, 401–02, 405

formal methods (software development), 49, 139,

299–302, 303, 356–58

B method, 49

dependability and, 299–302, 303

H

error avoidance and discovery from, 300–01

mathematical approach, 300, 301

model-checking, 300, 358–59

handlers, exceptions, 327–28

safety engineering, 356–59

hardware (system), 262

security testing, 404

hardware failure, 287, 560–61

system models and, 139, 299–301

hazard-driven approaches, 342, 349–51, 368

verification and, 300, 356–58

hazards, 342, 343, 345–51

formal specifications, 109, 300–02

analysis of, 345, 349–51

Fortify tool, 404

assessment, 345, 346–49

4 Rs model, 410–11, 414–15, 432

avoidance, 342, 351

4+1 view model, 173–74

damage limitation, 342, 351

frameworks, 443–46, 600–02, 708–10

detection and removal, 342, 351

Free Software Foundation, 219

fault tree analysis, 349–51

frequency (real-time systems), 627

identification of, 345–46

fuel delivery system, 618–19

probability, 343

functional requirements, 105–07, 134, 312, 317–18,

safety-critical system development, 342, 368

335, 344

severity, 343

functional specialization (software product lines),

heterogeneity, software development and, 24

450

hierarchical composition, 480

functionality, 286

hierarchical groups, 661–62

786 Subject Index

high-availability systems, 172, 218

inheritance, 152, 204, 209, 233, 722. See also

honesty (people management), 653

generalization

host-target development, 213, 216–18, 222

input/output mapping, 310–11

HTML5 programming, 28, 445

inputs, validity checks of, 326–27, 399

http and https protocols, 530–31

inspections, 229–30, 239, 710–714. See

human error, 307, 351–52, 418–21

also reviews

human needs hierarchy, 653–54

insulin pump control system, 32–34

activity model of, 33, 155

data-flow model (DMD) for, 155

dependability properties for, 288–89

I

failure in, 316–17

functional reliability requirements, 317

hardware components (diagram), 33

IDEs (Interactive Development Environments),

hazards in, 346

53, 217

natural language specification for, 122

ECLIPSE environment and, 218

non-functional reliability requirements, 316–17

general-purpose, 218

permanent software failure, 316

host-target development and, 216, 217–18, 222

risk classification for, 347–49

repository architecture for, 180

risk reduction for, 351–52

iLearn, 38–39, 567. See also digital learning

safety-critical system control, 341

environment

safety requirements for, 346–349, 351–52

implementation (system implementation), 28, 47,

safe state, 351

56–58, 69, 196–225

sequence diagrams for, 155

components, 465, 466, 471–72, 475, 487

software control of, 341

configuration management, 212, 215–16

software failure solutions, 351–52

design and, 56–58, 69, 196–225

structured language specification for, 123–24

interface specification, 208–09

tabular specification for, 124

life-cycle phase, 47

transient software failure, 316

host-target development, 213, 216–18

issue-tracking systems, 746–47

open-source development, 219–21

integrated application systems, 442, 454

reuse and, 212, 213–215

integration

service deployment and, 540–41

configuration and, 46, 52–54

service-oriented software for, 28

continuous, 78, 742–43

UML documentation, 197, 198–209

system development and, 570

unit testing and, 47

system testing and, 48

in-car information system, 522–24

systems of systems (SoS), 595, 597–99

inception phase (RUP), 46

integrity, security and, 374, 413

inclusion (people management), 653, 657

intellectual property rights, 28

incompatibility, component composition

interacting workflows, 545–46

and, 481–83

interaction models, 144–49, 163, 199–200,

incremental delivery, 46, 51, 62, 64–65,

496–97

76, 91

distributed systems, 496–97

incremental development, 46, 50–51, 73–74, 77

object-oriented design and, 199–200

incremental testing, 59, 242

sequence diagrams, 146–49, 163

incremental integration, 242

use cases, 144–46, 163, 200

incremental planning, 78

interactive applications, 25

information loss, 286

interface design, 57, 208–09

information systems, 32, 185–86, 187–89, 522–24

interface misunderstanding, 238

infrastructure security, 374, 375–76

interface misuse, 238

Subject Index 787

interfaces

legacy systems, 261–70, 280, 540, 576

application programming interfaces (APIs),

assessments, 269

595–96

business value of, 267–68, 280

component, 208–09, 222, 237–239, 465, 468–69,

component integration, 567

470–71

elements of, 262–63

model specifications, 470–71

management, 266–70

service design for, 533, 536–40, 596

maintenance of, 263–64, 280

specification, 208–09

reengineering and, 276, 278

systems of systems (SoS), 595–97

refactoring and, 279

unified user interface (UI), 596–97

replacement problems, 264–65

Internet banking system, 505

system evolution of, 546

interviewing techniques, 115–16

wrapping, 278, 442, 540

intolerable risks, 347

Lehman’s laws, 271

inversion of control, 445

Lesser General Public License, GNU, 220

ISO 9001 standards framework, 708–10, 734

licensing, 220–21, 356

iteration planning, 680

life cycles

iterative development/delivery, 65, 77, 98. See also

application system reuse problems,

agile methods

459–60

Iterator pattern, 212

project planning stages, 668

software evolution, 257–58, 266

software model process, 45, 47–49

lifetimes, system evolution and, 575–76

J

Linux, 219, 398

logging user actions, 398

logical view, 174, 192

Java programming language, 82, 152, 161, 197, 208,

long methods, 279

218, 219, 327, 330, 359, 444

embedded systems development and, 619–20

interfaces, 208

program testing, 243

M

real-time systems development and, 619

Java Virtual Machine, 217

JavaMail library, 214

maintainability, 22, 104, 169, 173, 198, 230, 266,

Jenkins, 743

274, 275, 289, 494

JSON (Javascript Object Notation), 531

maintenance (software maintenance),

J2EE platform, 161, 466

22, 270

JUnit, 59, 82, 217, 233, 243

agile methods and, 90, 92

architectural design and, 172–73, 178

costs, 274–76, 279

development v., 60–61

L

documentation and, 92, 273

legacy systems, 263–64

life-cycle phase, 48

language processing systems, 186, 189–91, 192

prediction, 274–76

large-scale systems, 556

reengineering, 273, 276–78

layered architecture, 177–79, 187–88, 192

refactoring, 278–79

layers

software evolution and, 22, 263–64,

legacy systems, 262–64

270–79

sociotechnical systems, 292–93, 557–58

types of, 271, 280

788 Subject Index

management (software management), 26, 66–68,

patient monitoring, 35

84–88. See also configuration management;

privacy and, 36

process improvement; project management;

process model of involuntary detention, 143

project planning; quality management;

release testing, 246, 247

version management

requirements-based testing and, 246

agile methods, 84–88

resilience of, 289, 428–30

automated, 423–24

safety and, 36

CBSE process, 474, 476

safety-critical system control, 342

coping with change, 63

scenario in, 124–25

planning, 132–33

scenario testing and, 247

process maturity method and, 66–68

security of, 289, 377, 400–01

real-time system processes, 632–34

sequence diagrams for, 146–49

requirements change, 130–34

sociotechnical system for, 562–63

resilience and, 421–24, 432

story cards and, 79–80

management complexity (SoS), 585, 586–87,

success criteria for, 562–63

587–90, 606

system boundaries, 141–42

manifesto, agile, 75–76, 77–78

task cards and, 79–80

marketing environment, 229

use case modeling and, 145–46

Mars exploration, 358

use cases for, 125–26

mathematical specifications, 121. See also formal

merging, 734, 739

methods

message exchange, 496–97, 526–29, 537

mean time to failures (MTTF), 313, 314

message passing interfaces, 238

measurement. See also metrics

metrics

ambiguity in, 724–25

AVAIL, 243–314

component analysis, 722–23

control/predictor, 717

controller/predictor metrics, 717

dynamic, 720–21

quality management (QM) and, 716–26, 727

events, 717

software analysis, 725–26, 727

non-functional requirements, 110

software quality, 716–26, 727

process measurement, 717–20

mental health care system (Mentcare), 34–36

probability of failure on demand (POFOD),

administrative reporting, 36

313–14, 316

aggregation association in, 153

product, 720–22, 727

authentication procedures, 416

rate of occurrence of failures (ROCOF), 313–314

class diagrams for, 149–151

reliability, 313–14, 316

client-server architecture of, 428

resource utilization, 717

context model of, 141–42

software measurement and, 716–26, 727

design risk assessment, 390–91

static, 720–21

dose checking test case, 80

time, 717

fail-secure approach, 397

Microsoft Office 360, 27

functional requirements in, 106–07

microwave oven scenario, 156–58

generalization hierarchy and, 153

middleware, 217, 218, 446, 465, 472–73,

goals of, 35

498–99

individual care management, 35

milestones (projects), 673, 674, 677–78, 696

key features of, 35–36

minimization strategies (risk management), 650–51

layered architecture pattern in, 179, 188

mission-critical system, 287

non-functional requirements in,

MODAF, 600, 601

109–10

model checking, 300, 358–59, 368

organization (diagram) of, 34

model-driven architecture (MDA), 159–62

passwords, 400–101, 416

model-driven engineering (MDE), 158–59, 442

Subject Index 789

modeling systems, 25, 138–66

motivation (people management), 653–56

models, 45–54, 138–66. See also spiral models;

multi-tenancy, 514, 515, 516

UML (Unified modeling Language)

multi-tier client-server architecture, 501, 505–06

activity diagrams (UML) for, 33–34, 141,

MySQL, 219, 445

143–44, 163

activity stages, 47–48, 142

agile approach and, 50, 162

algorithmic cost modeling, 683, 684–86

N

application architecture, 185

behavioral, 154–59, 163

class diagrams for, 149–50

N-version programming, 322–23

COCOMO II, 276, 476, 686–96

namespaces, 528–29

component, 470–73, 487

natural language requirements, 121–22

context, 141–44, 163, 199–200

nested technical and sociotechnical

data-driven, 154–55

systems, 416–17

dynamic, 199, 205, 206, 222

.NET framework, 161, 443, 446, 466, 470–71,

event-driven, 156–57

478, 507

formal (mathematical), 139, 300

non-deterministic properties, 561–62

generalization, 152–53, 205

non-functional requirements, 105, 107–11, 134, 169,

incremental development, 46, 49–51

172–73, 312, 314–18, 547

integration and configuration, 46, 52–54

interaction, 144–49, 163, 199–200, 496–97

ISO 9000 standards framework,

708–10, 734

O

object-oriented design, 199–200, 204–08

open-source licensing, 220–21

processes, 45–54, 68

object and function reuse, 438

project estimation, 682–96, 696

object classes, 149–50, 202–04, 470

quality management (QM) and, 709–10, 719

object constraint language (OCL), 208, 484–85

real-time system design, 617–19

object level (reuse), 214

reliability growth, 334

Object Management Group (OMG), 159

RUP (Rational Unified Process), 46–47

object-oriented metrics, 721–22

reuse-based development, 52–54

object-oriented systems

sequence, 144, 146–49, 155, 163, 205, 206–07

architectural design and, 201–02

spiral, 63, 256–57

class diagrams for, 149–50

state machine, 205, 207–08, 222, 617–18, 634

class identification, 202–04

state-based, 156–158, 163

design, 198–209, 222

static, 205, 222

frameworks in, 444

stimulus/response, 613–14, 634

interface specification, 208–09

structural, 149–54, 163, 199, 205

system (design) models, 204–08

subsystem, 205–06

Unified Modeling Language (UML) and, 140,

‘Swiss cheese,’ 420–21

198–209

of testing process, 230–31

use case model, 200–01

UML (Unified Modeling Language), 33–34, 139,

Objectory method, 125

140–41, 144–49, 713

observe and react pattern, 620, 621–23

use case, 125–26, 141, 144–46, 163, 200–01

Observer pattern, 210–11

model-view-controller (MVC) pattern, 176–77,

on-site customer, 78

179, 444

openness, distributed software, 491, 492, 493

monitoring projects, 651–52, 673

open-source development, 219–21, 222, 738–39

790 Subject Index

operating system layer, 292

physical view, 174, 192

operating systems (real-time), 631–34, 635

pipe and filter architecture, 182–84, 191

operation and maintenance, 48

plan-driven process, 45, 47, 50, 73, 570

operation incompatibility, 481

agile methods v., 45, 74–75, 91–93, 98

operation incompleteness, 481

changing environment and, 73

operation stage (systems), 554

incremental development and, 50

operational failure, 287

model processes, 47, 50

operational processes, 421–24, 432

project planning and, 672–75, 696

operational profiles, 334–35

scheduling and, 675–76

operational security, 374, 376

system development and, 570

operator reliability, 287, 560–61

testing (validation) phases, 59–60

Oracle, 21, 219

waterfall model, 47–48

organizational design patterns, 175

planning game, 681–82

organizational layers, 292, 557

planning. See also project planning

organizational requirements, 108–09

incremental, 78

organizational systems, 589

requirements management, 132–33

organizations and security, 380–82

risk, 650–51

overhead costs, 669

Scrum product backlog, 85, 86, 98

overspecification of reliability, 315

test, 231

platform-independent model (PIM), 159–61

platform-level protection, 393–394

platform services, 472

P

platform specialization (software product

lines), 450

platform-specific models (PSM), 160–61

packing robot control system, 168

plug-in architecture, 218

pair programming, 78, 83–84, 715

pointers, 308, 329

parameter definition, 452

post-architectural level, 692–94

parameter incompatibility, 481

power supply failure, 627–28

parameter interfaces, 237

practice perspective (RUP), 46

partition testing, 234–36

prediction, maintenance and, 274–76

partner company software systems, 49

predictor metrics, 717

password checker, 392

PRISM model checker, 358

passwords, 400–01, 413, 414, 416

probability of failure on demand (POFOD),

path testing, 237

313–14, 316

patient records system (PRS), 148–49

probability values, hazards, 343

patterns, 175–84, 209–12, 442, 444

problem tracking, 216

application frameworks and, 444

procedural interfaces, 238

architectural, 175–84

process (software processes), 23, 26, 43–71

design, 209–12, 442

activities, 40, 54–61

payment models, 547

agile approach, 45, 66

peer-to-peer (p2p) architecture, 501, 509–12, 517

analysis, 67, 112–20, 626–31

penetration testing, 403–04

assurance, 353–56

People Capability Maturity Model

design and implementation, 56–58

(P-CMM), 656

emergency repair, 260–61

people management, 652–56, 664

engineer activities for, 20, 23, 44, 54–61

performance, 172, 248

evolution, 44, 60–61, 258–61

periodic stimuli, 613

improvement of, 65–68

photo library, 483–85

life cycles, 45, 47–49

Subject Index 791

management, 421–24, 432

product architects (Scrum), 96

maturity approach, 66–68

product backlog (Scrum), 85, 86

measurement, 66–67, 717–20

product owner (Scrum), 85

models, 45–54, 68

product risk management, 644–75, 646

operational, 421–24, 432

professional software development, see

plan-driven, 47–48

development

professional, 19–28, 45

program evolution dynamics, 271

prototype development, 62–63

program generators, 442

quality (process-based), 65–68, 705

program inspections, 229–30, 239, 713–14. See

quality metrics, 717–20

also reviews

review phases, 711–13

program libraries, 442

RUP (Rational Unified Process), 46–47

program modularization, 277

specification, 44, 54–56

program structure improvement, 277

standards, 45, 707, 708

programmer/tester pairs, 231–32

validation, 44, 58–60

programming. See also extreme programming

process change, 45, 69

dependable guidelines, 325–31

agile manifesto and, 75–76

egoless, 83

CBSE, 473–480

engineering design and, 23, 44, 58

coping with, 61–65

real-time systems, 619–20

evolution, 258–61

secure system guidelines, 401–02

implementation, 259–60

techniques/activities, 26, 54–56

for safety assurance, 353–56

project management, 84–88, 641–66

software processes, 61–65, 67

activities, 643–44

urgent changes, 260

agile methods and, 84–88, 643, 647, 661

process improvement, 69

differences from engineering, 642–43, 664

agile approach, 66

motivation and, 653–56

business values, 267–68

relationships with people, 652–56, 664

legacy system management, 266–70

risk management, 644–52, 664

process maturity approach, 66–68

teamwork, 656–64

reengineering, 276–78

project planning, 92–93, 667–99

refactoring, 278–79

agile methods and, 670, 680–83, 696

software quality and, 65–68

bidding, 669, 671–72

software evolution and, 266–70, 276–79

COCOMO II cost modeling, 686–96

process management, real-time systems, 632–34

development team effectiveness, 92–93

process maturity approach, 66–68

duration and staffing, 694–96

process pipeline pattern, 620, 625–26

estimation techniques, 682–86, 696

process requirements, 317

life cycle stages of, 668

process specialization (software product lines), 450

milestones, 673, 674, 677–78, 696

process view, 174, 192

plan-driven development and,

procurement (acquisition), 473, 553–54, 566–70, 577

672–75, 696

producer/consumer pattern, 202

process, 673–75

producer/consumer processes (circular buffer),

project costs, 669, 696

616–17

scaling agile methods for, 91–93

product

scheduling and, 675–80, 696

instance development, 450

software pricing for, 670–72, 696

quality metrics, 720–22, 727

supplements, 673

requirements, 108–09

user stories for, 681–82

software types, 20–21, 24–26

project risk management, 644–45

standards, 706, 707

Promela, 358

792 Subject Index

protection, 383

recognition, 410, 411, 414–15, 432

assets, 380, 384, 390

record-level protection, 393–394

cybersecurity, 376, 414

recovery

fault-tolerant architecture, 319–20

database integrity checking and, 430

layered architecture design, 393–95

design for, 400–01

systems, 319–20, 414

requirements, 317

prototyping (system prototyping), 62–63, 69, 117, 130

resilience and, 411, 414–15, 430, 432

Python, 190, 197, 198, 327, 444

reductionism of complex systems, 590–93, 606

redundancy

dependability and, 295–97, 303

diversity and, 318, 398

Q

requirements, 317

reengineering (software reengineering), 273,

276–78, 280

quality management (QM), 299, 700–29

refactoring, 51, 62, 78, 80–81, 83–84, 168, 278–79

agile development and, 714–16, 727

agile methods, 51, 80–81

configuration management (CM) and, 733

architectural design and, 168

documentation standards, 706

extreme programming (XP) methods, 78

reviews and inspections, 710–14, 727

maintenance and, 278–79

software development and, 701–02

pair programming, 83–84

software measurement/metrics and, 716–26, 727

software evolution, 273, 278–79, 280

software quality and, 703–05, 727

reference architectures, 191

software standards and, 706–10, 727

refinement-based development, 300

quality of service (QoS), 492, 495

regression testing, 244

quantitative reliability specifications, 314–15

regulation and compliance (software), 294–95, 353

regulators, 294–95, 361, 362, 368

reinstatement, 411, 414–15, 432

release alignment (Scrum), 96

R

release management, 216, 731, 750–53, 754

release testing, 245–48

reliability, 309

range checks, 326

availability and, 309–12

rapid software development, 73–74

dependability and, 288–90, 297, 303, 336

rate of occurrence of failure (ROCOF), 313–14

diversity and, 318, 322, 323–25, 336

reactive systems, 612

emergent properties, 560–61

realism checks, 129

failure and, 18, 307–12, 560–61

real-time systems, 205, 218, 610–37

fault-tolerant architectures, 318–25

architectural patterns for, 620–26, 634

functional requirements, 312, 317–18, 335

design, 205, 613–20

growth modeling, 334

embedded systems, 218, 610–637

human error, 307

modeling, 617–19, 634

measurement of, 331–35

operating systems, 631–34, 635

metrics, 312–13, 332, 335

process management, 632–34

non-functional requirements, 312, 314–18

programming, 619–20

operational profiles, 334–35

responsiveness, 611–12

overspecification of, 315

software engineering for, 610–37

programming guidelines, 325–31

stimulus/response model, 613–14, 634

requirements, 312–18, 335

timing analysis, 626–31, 635

safety and, 340–41

reasonableness checks, 327

security and, 379

Subject Index 793

sociotechnical systems, 560–61

user, 102–03

software, 18, 560–61

validation, 55, 129–30, 135

specification, 314–18

volatile, 132

system error, 307–09

requirements engineering (RE), 69, 101–37

system fault, 307–09

change management, 111, 130–34

systems, 18, 19, 22, 288–90, 297, 303, 306–38

documents for, 103–05

statistical testing, 332–33, 336

elicitation/analysis process, 112–20, 134

remote method invocations (RMIs), 497

ethnography technique for, 116–18

remote procedure calls (RPCs), 470, 471, 497

feasibility studies, 54, 104

repairability, 289

interviewing techniques for, 115–16

repeatable process, 297, 303

processes, 111–12, 134

replicated servers, 318

software process activities, 44, 54–56

repository architectural pattern, 179–80, 190

software documentation (SRS) for, 126–27

repository cloning, 737–38

spiral model for, 112

representation checks, 327

system development and, 570

requirements, 102, 134

requirements partitioning, 571

agile methods and, 55, 131–32

research management systems, 448–49

analysis and definition (life-cycle phase), 47

resilience (system resilience), 288, 409, 408–34

availability, 218

activities, 410–11

business changes, 131, 135

automated management, 423–24

classification and organization of, 113

cybersecurity, 412–16, 432

components, 218

dependability and, 288, 289

discovery and understanding, 113, 115–18

design for, 424–32

documents (software specification), 103–04, 111,

efficiency and, 422–23

114, 126–29, 135

engineering, 408–34

elicitation and analysis of, 55, 112–20, 134

4 Rs model, 410–11, 414–15, 432

enduring, 132

human error and, 418–21

engineering understanding of, 20, 23, 26

interrelated business approach, 426–27

evolution, 131

management, 421–24, 432

functional, 105–07, 134, 317–18

operational processes, 421–24, 432

hazard-based, 345

security and, 288, 379

identification, 132

sociotechnical systems, 416–24

management, 132–134, 135

survivable systems analysis, 425–26

non-functional, 105, 107–11, 134, 314–17

system failure and, 410–12

notations for writing, 121

testing, 427–428

prioritization and negotiation of, 113

resistance, 410–11, 414–15, 432

refinement, 53

resource management systems,

reliability and, 312–18

188–89, 192

reviews, 130

resource sharing, 491

risk-based, 344, 345

respect (people management), 652

safety, 344–52

restart capabilities, 329–330

specification, 55, 69, 102–03, 106–07, 110,

restaurant interactions, 496–97

120–29, 135, 314–18, 344, 345

RESTful services, 524, 529–33, 544

spiral model for, 572

reuse (software reuse), 26, 28, 46, 52–54, 169,

software process, 44, 54–56

209–10, 212, 213–15, 222, 437–63,

storage, 132

474–480

system, 102–03, 120–21

application frameworks, 442, 443–46, 460

testing (requirements-based), 245–46

application system, 438, 453–60

traceability, 132, 133

approaches supporting, 441–43

794 Subject Index

reuse ( continued)

security assessment, 381–82, 405

architectural design and, 169

triangle, 347–48

CBSE for, 473, 474–77

risk management, 644–52, 664

CBSE with, 473, 477–80

identification of risk, 647–48

component selection and design, 57

planning process, 650–51

components, 52–53, 212, 214, 221, 438–439, 452,

processes, 645–47

468, 487

product risks, 644–45

costs of, 214, 439

project risks, 644–45

design patterns, 209–10, 212, 442, 444

risk analysis and, 648–49

engineering applications of, 26, 28

risk monitoring, 651–52

generator-based, 443

strategies for, 650–81

implementation and, 212, 213–15

risk-based requirements specification, 344, 345

integration and configuration of, 52–54

robot control system, 168

integration problems, 459–60

role replication (Scrum), 96

landscape, 340–443

Ruby, 190, 444

levels of, 213–14

RUP (Rational Unified Process), 46–47

object and function, 438

process model for, 52–53

software development tools, 53

software product lines, 442, 446–52

system features and, 46

S

reuse-based software engineering, 53–54, 438

reuse model, 690–92

safety, 339–72, 379

reverse engineering, 277

architectural design and, 172

reverse planning, 680

assurance processes, 353–56

reviews, 130, 229, 239, 710–14

costs and, 357, 362–63

checklists, 713–14

dependability and, 288, 299

code, 83, 715

engineering processes, 352–61

hazard register for, 355

ethics and, 30–31

inspections and, 229, 710–14, 727

formal verification, 356–58

program inspections, 713–14

functional requirements, 344

quality management (QM), 710–14, 727

hazard-driven requirements, 345, 368

requirements validation, 130

hazards and, 342, 343, 345–351

review process, 711–13

model checking, 358–59, 368

safety, 354, 355

regulation and compliance for, 294–95

verification and validation using, 229

regulators, 294–95, 361, 362

rework, 49, 56, 61, 73, 75, 84, 129

reliability and, 340–41

risk

requirements, 344–52, 362

acceptable, 347–48

risks and, 343, 343–44, 347–48, 351–52

accidents (mishaps) and, 343–44, 347

software certification, 355–56

analysis, 362, 648–49

static program analysis, 359–61, 368

as low as reasonably practical (ALARP), 347

terminology, 343

defined, 343

safety cases, 361–67, 368

redundancy and diversion for, 398

development of, 362–63

indicators, 652

organization of, 361–62

intolerable, 347

regulators for, 361, 362, 368

ranking types of, 649

software safety arguments, 364–67

reduction, 351–52, 398

structured arguments, 363–364

Subject Index 795

safety-critical systems, 287, 340–44, 368

programming guidelines, 401–02

certification of, 294, 302, 355–56

protection, 380, 384, 390, 393–94, 395

control systems, 341–42

regulation and compliance for, 294–95

dependability and, 294, 302

reliability and, 379

development process and, 352–53

requirements, 382–88

error-prone constructs and, 329

resilience and, 288, 379

hazard-driven techniques, 342

risk assessment, 381–82, 405

primary safety-critical software, 341

safety and, 379

process assurance and, 355–56

system layers, 374–75

regulation and compliance for, 294, 353

terminology, 377–378

risk triangle for, 347–48

testing, 402–04

secondary safety-critical software, 341–42

threats, 377, 378, 404

system failure and, 340–41

trust and, 22, 24

safety reviews, 355

usability guideline, 397–98

SAP, 21

validation, 405

Sarbanes Oxley accounting regulations, 51

vulnerability and, 377, 378, 391, 401

scalability, 491, 492, 494, 514, 515–16

self-monitoring architecture, 320–22

scale, software development and, 24

SEMAT (software engineering methods and tools)

scaling agile methods, 88–97, 98

initiative, 24

scenarios

semicentralized P2P architecture, 511, 512

elicitation of requirements from, 118–20

sensor-based data collection systems, 32

testing, 246–47, 252

separation of concerns, 486

use cases, 125–26

sequence diagrams, 141, 144, 146–49, 155, 163, 205,

scheduling, 675–80, 696

206–07, 241

activity charts for, 678–80

sequential composition, 480

project planning and, 675–80, 696

server overload, 512–13

plan-driven projects, 675–76

service engineering, 533–41

presentation (visualizing), 676–80

candidate identification, 533–36

Scrum, 73, 78, 85–88, 96, 98

implementation and deployment, 540–41

secure systems, 561

interface design, 533, 536–40

security, 24, 26, 373–407

legacy systems and, 540

application, 374–375

service information exchange (SOAP), 525–26,

architectural design and, 172, 388,

531, 544

392–95

service-oriented architectures (SOAs), 513–14,

assurance, 402–04

520–50

availability, 374, 375, 379

approach, 522, 524

checklist, 403

components, 526–29

confidentiality, 374

message exchange, 526–29

controls, 377, 378–79

service interface, 528

dependability and, 22, 26, 288, 376–79

service protocols, 525

design for, 374, 388–402, 405

software as service (SaS) v.,

engineering, 373–407

513–14, 522

failure, 397

standards, 525–26

guidelines, 396–401, 404

web applications, 524–29

infrastructure, 374, 375–76

WSDL and, 526, 527–29

logging user actions, 398

service-oriented software engineering, see service

operational, 374, 376

engineering; service-oriented architectures

organizations and, 380–82

(SOAs); services

policies, 396–97

service-oriented systems, 442, 466–67, 526–33

796 Subject Index

service-to-service communication, see

regulation and compliance, 294–95

integrated services

resilience and, 416–24

services, 521

success criteria, 562–63

business, 534, 541–47, 548

systems engineering for, 556–59

classification of, 534, 548

software, 19, 20, 228

communication and, 524–29

attributes, 20, 22

components, 521, 526–29

customized (bespoke), 21

composition (construction) of, 541–47

efficiency, 22

coordination, 534, 548

engineering ethics, 28–31

incremental delivery and, 64–65

failures, 18

operation and maintenance for, 48

generic products, 20–21

process models for, 544–46

issues affecting, 24

reusable Web components, 52, 526–29

lifetime, 256–57

reuse of, 542

product types, 20–21, 24–26

software development and, 541–47, 548

professional development, 19–28

testing, 543, 546–47

regulation and compliance of, 294–95

utility, 534, 548

system boundaries and characteristics, 26

web-based, 27–28, 521

software architecture catalog, Booch’s, 170

RESTful approach, 524, 529–33, 544

software as service (SaS), 512–16, 517

service information exchange (SOAP), 525–26,

configuration of, 514–15

531, 544

multi-tenancy, 514, 515, 516

workflow, 542, 543, 544–46, 548

scalability, 514, 515–16

servicing, evolution v., 257–58

server overload and, 512–13

shared memory interfaces, 238

service-oriented architectures (SOAs) v.,

signatures, 744–45

513–14, 522

simple design, 78

‘software crisis’, 19

simplicity (agile methods), 76, 78, 91

Software Development Life Cycle (SDLC) model, 45

simulation systems, 25

software development tools, 53

simulators, 217

software diversity, 318, 322, 323–25, 336

size checks, 327

software engineering, 19–23, 40, 92

SLAM model checker, 358

activities for software process, 20, 23, 44

small releases, 78

computer science v., 20, 23

social change, business and, 24

diversity, 24–27

social layer, 292

engineering discipline, 21–22

sociotechnical systems, 552, 577

ethical responsibility and, 28–31, 40

complexity of, 556, 558–59

formal verification, 356–58

defensive layers, 419–20

fundamental notions in, 26, 40

emergent properties 544, 559–61, 577

Internet effect on, 20, 27–28

environment and software interaction,

licensing for, 356

293–94

model checking, 358–59, 368

failure propagation, 560–61

model-driven engineering (MDE), 158–59

human error and, 418–21

product development and, 20–21

layers of, 292–93, 557

reuse-based, 53–54, 438

management, 421–24, 432

safety processes, 352–61

nested technical systems, 416–17

static program analysis, 359–61, 368

non-deterministic properties, 561–62

systems engineering v., 20, 23, 40, 554

operational processes, 421–24, 432

web-based systems, 27–28

organizational elements, 557–58

Software Engineering Institute (SEI), 67

Subject Index 797

software measurement/metrics, 716–26, 727

standards

software platform, 57

documentation, 706

software pricing, 670–72, 696

ISO 9000 standards framework, 708–10, 734

software product lines, 442, 446–52

process, 45, 707, 708, 734

software quality attributes, 704

product, 706, 707

software requirements specification (SRS),

quality management (QM) and, 706–10, 727

126–29

software, 706–10, 727

software safety arguments, 364–67

service-oriented architectures (SOAs),

source code translation, 277

524, 525–26

SourceForge, 476, 478

value of, 707–08

space shuttle (U.S.) system, 319

web service, 525–26

specialization, software product lines, 450

state diagrams (UML), 141, 163, 205, 207–08

specifications (software specifications), 20, 54–56,

state machine models, 205, 207–08, 222, 617–18

208–09, 300–02

state-based modeling, 156–58

availability, 313

static analyzers, 217

engineering definition and constraints, 23

static metrics, 720–21

functional requirements, 106–07

static models, 143, 205, 222

graphical notations, 121

static perspective (RUP), 46

dependability and, 300–02

static program analysis, 359–61, 368

design interface, 208–09

statistical testing, 332–33, 336

errors, 324–25

stimulus/response (embedded systems) model,

formal techniques, 300–02

613–14, 634

hazard-driven safety requirements, 345

storage management, 132, 740

management of, 26

stories, elicitation of requirements from, 118–20

natural language requirements, 121–22

story cards, 79–80, 99. See also user stories

non-functional requirements, 110

stress testing, 248

problem analysis and, 133

structural models, 149–54, 163, 199, 205

reliability metrics, 313–14

structured arguments, 363–64

risk-based requirements, 344, 345

structured natural language requirements, 121,

safety requirements and, 344–45

122–24

software process, 44, 54–56

subsystem engineering, 571, 573

SRS document, 126–29

subsystem faults, 573

structured natural language requirements, 121,

subsystem model, 205–06

122–24

Subversion system, 216, 735

system failure and, 310

support environment, 32

system requirements, 102–03, 120–29, 135

support services, 472

use cases, 125–26

support software, 262

user requirements, 102–03, 120, 135

survivable systems analysis, 425–26

speculative generosity, 279

sustainable pace, 78

SPIN model checker, 358

‘Swiss cheese’ model, 420–21

spiral models, 48, 112, 256–57, 572

switch (case) statements, 279

sprint (Scrum), 85, 86–87

system availability, see availability

SQL (Structured Query Language), 218, 399, 401,

system boundaries, 141–42, 163, 199, 556–57

445, 505

system building, 731, 740–45, 753

stable domain abstractions, 475

system construction by composition, 543–44

staff allocation charts, 678, 680

system design

stakeholders, 103–04, 107, 112–16

actuator control processes, 613–14, 615

stand-alone applications, 25

embedded systems, 217–18, 613–20

798 Subject Index

system design ( continued)

analysis for architectural design, 169

host-target development, 213, 216–18, 222

case study types, 31–32

modeling, 617–19

complexity of, 18, 93–96, 274–75, 278, 552–53,

producer/consumer processes, 616–17

558–59

programming, 619–20

cost effectiveness of, 22–23

real-time systems, 205, 613–20

dependability, 268, 286–91, 303

risk assessment, 389–92

engineering fundamentals for, 26, 40

security systems, 388–402, 405

large-scale, 93–94, 556

stimulus response model, 613–14

modeling, 25, 138–166

system error, 307–09

sociotechnical, 291–95, 303, 556–63

system failure, 307

software design and, 47

acceptance of, 410

specification requirements, 120–29

availability and, 309–12

state representation, 155

costs of, 286

systems of systems (SoS) v., 581–82

critical systems, 287, 290, 297, 302,

types of, 18, 20–21, 24–26, 32, 40, 552

340–41

systems engineering, 20, 23, 40, 551–79

dependability and, 22, 268, 286–91, 303

conceptual design, 553, 563–66, 577

error and fault v., 308

development processes, 570–74, 577

hardware failure and, 287

enterprise systems, 552

human errors and, 287, 351–52

lifetimes and, 575–76

nondeterminism and, 560–61

range of disciplines, 554–55

reliability and, 307–12, 560–61

sociotechnical systems, 552, 556–63, 577

reparability and, 289

software engineering v., 20, 23, 40, 554

resilience and, 410–12, 420–21

spiral model for requirements, 572

safety-critical systems, 340–41

stages of, 553–54

security and, 22, 268, 397

system evolution, 575–76

sociotechnical, 560–61

system procurement (acquisition), 453–54,

software failures and, 287, 340–41

566–70, 577

specifications and, 310

technical computer-based systems, 552

‘Swiss cheese’ model of, 420–21

systems of systems (SoS), 25, 256, 442, 556,

types of, 287

580–609

system fault, 307–09

architectural design, 595, 599–606, 607

system infrastructure frameworks, 446

classification of systems, 587–90, 606

system integration, 215–16

container systems, 603–05

system level (reuse), 214

data-feed systems, 602–03

system modeling, see models

deployment and integration of, 595, 597–99

system of system coalitions, 589

engineering, 593–99

system output, 268

governance complexity, 586–87,

system requirements, 52, 102–03

588–90, 606

system reuse, 438

interface development, 595–97

system selection, 594–95

large-scale systems, 556

system testing, 48, 59, 231–32, 240–42

management complexity, 585, 586–87,

system versions, 323–325

587–90, 606

system vision document, 565–66

reductionism, 590–93, 606

systems (software systems). See also distributed

software systems, 582

systems; embedded software systems;

system complexity, 584–87, 606

systems of systems (SoS)

system v., 581–82

activity models (diagram), 60, 61

technical complexity, 585, 586–87, 590

agile methods for, 93–96

trading systems, 605–106

Subject Index 799

T

statistical, 332–33, 336

system, 59, 232, 240–42

test-driven development (TDD), 242–45

tabular specification, 124

tool-based analysis, 404

task cards, 79–80, 82. See also user stories

unit testing, 47, 232–37

teamwork, 656–64

user testing, 249–51

development team, 85, 90, 92–93

validation, 58–60, 227–29

group cohesion, 658

threats, 377, 378, 404, 413, 414–15

group communication, 662–64

timeouts, 330–31

group member selection, 659–60

timestamps, 744

group organization, 660–62

timing analysis, 626–31, 635

hierarchical groups, 661–62

timing errors, 238–39

hiring people, 661

TOGAF, 600, 601

physical work environment and, 663

tool-based analysis, 404

technical complexity, SoS, 585, 586–87, 590

tool support, 132, 743, 744, 746

technical computer-based systems, 552

traceability (requirements), 132, 133

test cases, 130, 234–37, 252

trading systems, 605–06

test-driven development (TDD), 242–45

transaction-based applications, 25

test-first development, 59, 78, 81–83, 252

transaction processing systems, 185,

test planning, 231

186–87, 192

testing (software testing), 58–60, 226–54, 402–04,

transition phase (RUP), 46–47

427–28

triple modular redundancy (TMR), 322

acceptance, 77, 82, 249, 250–51, 252

trust, security and, 22, 24

agile methods for, 59, 78, 81–83, 251

two-tier client-server architecture, 501, 503–05

alpha, 249

assurance and, 402–04

automated, 78, 81–83, 233–34,

242, 252

U

beta, 58, 60, 249–250

choosing test cases, 234–37, 252

component testing, 59, 232, 237–39

UML (Unified Modeling Language), 140

customer, 58, 59

activity diagrams, 33–34, 141, 143–44

debugging v., 58, 232, 244

architectural design and, 139, 175, 205

defect, 58, 227–28, 232, 245, 248

behavioral models, 155–57

development and, 59–60, 81–83, 570

business processes and, 143–44

development testing, 231–42, 252

class diagrams, 141, 149–50

goals of, 227

component interface diagram, 469

incremental approach, 59

deployment diagrams, 149, 218

inspections v., 229–30

diagram types, 139, 140–41, 205

model of, 230–31

event-driven, 156–57

penetration, 403–04

executable (xUML), 162

plan-driven phases, 59–60

generalization and, 152

process, 58–60

interaction models, 144–49

release testing, 245–48

object oriented metrics and, 721

reliability and, 332–33, 336

object-oriented systems and, 140, 198–209

resilience, 427–428

package symbol, 37

security, 402–04

sequence diagrams, 141, 146–49, 155, 163, 205,

services, 543, 546–47

206–07

stages in, 59, 231

state diagrams, 141, 205, 207–08

800 Subject Index

UML ( continued)

testing, 58–60, 227–29

subsystem models, 205–06

verification v., 227–29

system modeling using, 139, 140

validity checks, 129, 326–27, 399

use cases, 125–26, 141, 144–46, 163, 205

vehicle dispatcher system, 448–49

workflow models, 143–44, 544

velocity (Scrum), 85

unified user interface (UI), 596–97

verifiability, 129

Uniform Resource Locator (URL), 530–32, 539

verification (software verification)

unit testing, 47, 231, 232–37

cost effectiveness of, 357

Universal Description, Discovery, and Integration

formal methods and, 300, 356–59

(UDDI), 526

goal of, 228

Universal Resource Identifiers (URIs), 471, 527

levels of confidence, 228–29

Unix systems, 183, 401

model checking, 300, 358–59

urgent changes, 260

safety engineering, 356–59

usability

validation v., 227–29

error tolerance, 289

version control (VC) systems, 731, 735, 753

patterns, 175

version management (VM), 215, 216, 731,

requirements, 109–10

735–40, 753

security guideline, 397–98

vertical software packages, 20

usage, component models and, 471

views, architectural, 173–175, 192

use cases, 125–26, 141, 144–46

Virtual Learning Environment (VLE), 38

interaction models, 144–46, 163, 200–01

virtual systems, 588

requirements specification and, 125–26

visibility of information, 325–26

testing, 240–41

volatile requirements, 132

UML diagram models, 141

VOLERE requirements engineering method, 123–24

user access, 392

vulnerability, 377, 378, 391, 401, 402

user actions, logging, 398

user-defined error checking, 360

user expectations, 228–29

user interface design, 62

W

user requirements, 55, 73–74, 102–03

user stories, 79–80, 82, 86, 247, 681–82

conceptual design and, 565–66

waterfall model, 45, 47–49

project planning (agile method) with, 681–82

weather information database, 531–32

task cards, 79–80

weather stations, see wilderness weather stations

user testing, 249–51

web application frameworks (WAFs), 444

utility services, 534, 548

web-based systems, 27–28

web services, 27, 52, 521, 524–33. See also services;

WSDL

browser development, 27, 521

V

business process model and, 544–46, 548

business, 534, 541–47, 548

classification of, 534, 548

V & V (verification and validation), 58, 227–29, 356.

clouds, 27, 532

See also testing; validation

components for, 526–29

V-model, 60

composition (construction) of, 541–47

vacation package workflow, 542, 544–45

coordination, 534, 548

validation (software validation), 20, 69, 58–60

defined, 27, 521

engineering activities for, 23, 44

http and https protocols, 530–31

requirements, 55, 129–30, 135

interactive transaction-based applications, 25

Subject Index 801

interfaces, 28, 528

state diagram, 207–08

resource operations, 530

station maintenance system, 37

RESTful approach and, 529–33, 544

system testing, 240–41

reusable components as, 52, 526–29, 542

use case model for, 200–01

service-oriented architecture (SOA) and, 524–29

work environments, 663

SOA approach, 524

work flow representation (UML), 143–44

software development and, 541–47, 548

workflow, 83, 452, 542, 543, 544–46, 548

standards, 525–26

wrapping, legacy system, 278, 442, 540

testing, 543, 546–47

WS-BPEL, 525, 526, 544, 546

utility, 534, 548

WSDL (Web Service Definition Language), 526,

WSDL interface, 528

527–29, 537, 540, 544

‘wicked’ problems, 130–31, 286, 301

message exchange, 527–29, 537

wilderness weather stations, 36–38

model elements, 527–28

architectural design of, 201–02

service deployment and, 540

availability and reliability of, 289

web service interface, 528

‘collect weather data’ sequence chart for, 241

context model for, 199

data collection (sequence diagram) in, 206

X

data collection system architecture in, 202

data management and archiving system, 36

environment of, 36–37

XML, 470, 525, 527–529

high-level architecture of, 201

language processing, 186, 189, 191, 470, 544

interface specification, 208–09

namespaces, 528–29

object class identification, 202–04

service descriptions, 528–29

object interface of, 233

web services and, 525

objects, 203–04

WS-BPEL workflow models, 544, 546

sequence diagram for, 241

WSDL message exchange, 527–29

sociotechnical system of, 291–92

XML-based protocols, 521

This page intentionally left blank

Author Index

A

B

Abbott, R., 202, 224

Badeau, F., 300, 304

Abdelshafi, I., 87, 100

Balcer, M. J., 162, 165

Abrial, J. R., 49, 71, 300, 304, 357, 370

Ball, T., 300, 305, 358, 361, 370

Abts, C., 459, 460, 462, 594, 608, 684, 688, 691,

Bamford, R., 709, 729, 734, 755

694, 699

Banker, R. D., 275, 282

Addy, E., 476, 489

Basili, V. R., 73, 100

Aiello, B., 731, 754, 755

Bass, B. M., 655, 666

Alexander, C., 209, 224

Bass, L., 169, 170, 175, 192, 194

Alford, M., 552, 579

Baumer, D., 446, 462

Ali Babar, M., 169, 194

Baxter, G., 559, 579

Allen, R., 459, 460, 463

Bayersdorfer, M., 221, 224

Ambler, S. W., 89, 95, 98, 99, 140, 162, 165

Beck, K., 71, 77, 80, 98, 99, 100, 203, 224, 242,

Ambrosio, A. M., 341, 372

254, 279, 282, 680, 699

Amelot, A., 300, 304

Beedle, M., 71, 85, 100

Anderson, E. A., 300, 305

Behm, P., 356, 371

Anderson, R. J., 495

Belady, L., 271

Anderson, R., 376, 402, 405, 406

Bell, R., 347

Andrea, J., 244, 254

Bellouiti, S., 87, 100

Andres, C., 98, 680, 699

Bennett, K. H., 257, 282

Appleton, B., 175, 194, 754

Benoit, P., 356, 371

Arbon, J., 252

Bentley, R., 125, 137

Arisholm, E., 84, 99

Berczuk, S. P., 175, 194, 754

Armour, P., 696

Bernstein, A. J., 186, 195

Arnold, S., 552, 579

Bernstein, P. A., 498, 519

Ash, D., 275, 282

Berry, G., 612, 637

Atlee, J. M., 135

Bezier, B., 235, 254

Avizienis, A. A., 286, 303, 304,

Bicarregui, J., 300, 302, 303, 305

323, 338

Bird, J., 280

804 Author Index

Bird, J., 90, 100

Clark, B. K., 683, 684, 688, 691, 694, 699

Bishop, P., 361, 371

Cleaveland, R., 371

Bjorke, P., 563, 579

Clements, P., 169, 170, 175, 192, 194

Blair, G., 491, 517, 519

Cliff, D., 583, 592, 609

Bloomfield, R. E., 361, 371

Cloutier, R., 563, 579

Bochot, T., 300, 305, 358, 371

Cohn, M., 680, 697, 699

Boehm, B. W., 40, 45, 48, 71, 98, 227–28, 254, 459,

Coleman, D., 275, 282

460, 462, 594, 608, 649, 666, 683, 684, 687,

Collins-Sussman, B., 216, 225, 735, 755

688, 691, 694, 695, 697, 699

Connaughton, C., 727

Bollella, G., 619, 637

Conradi, R., 69

Booch, G., 140, 165, 166, 170, 193, 194

Cook, B., 300, 305

Bosch, J., 169, 173, 180, 194

Cooling, J., 627, 637

Bott, F., 31, 42

Coplien, J. O., 175, 194

Bounimova, E., 300, 305

Coulouris, G., 491, 517, 519

Brambilla, M., 139, 159, 163, 165

Councill, W. T., 467, 489

Brant, J., 80, 100, 279, 282

Crabtree, A., 117, 137

Brazendale, J., 347

Cranor, L., 398, 406

Brereton, P., 517

Crnkovic, I., 487, 488

Brilliant, S. S., 324, 338

Cunningham, W., 84, 100, 203, 224

Brook, P., 552, 579

Curbera, F., 544, 550

Brooks, E. P., 665

Cusamano, M., 231, 254

Brown, A. W., 98, 684, 688, 699

Brown, L., 376, 405, 407

Bruno, E. J., 619, 637

Budgen, D., 517

Burns, A., 619, 631, 634, 635, 637

D

Buschmann, F., 175, 194, 195, 209, 224, 225

Buse, R. P. L., 726, 729

Daigneau, R., 548

Dang, Y., 719, 726, 729

Datar, S. M., 275, 282

Davidsen, M. G., 272, 282

Davis, A. M., 102, 137

C

Deemer, P., 88, 100

Dehbonei, B., 356, 371

Cabot, J., 139, 159, 163, 165, 488

Deibler, W. J., 709, 729, 734, 755

Calinescu, R. C., 300, 305, 583, 592, 609

Delmas, D., 356, 372

Carollo, J., 252

Delseny, H., 356, 357, 372

Cha, S. S., 349, 371

DeMarco, T., 665

Chapman, C., 220, 225

den Haan, J., 159, 165

Chapman, R., 300, 305, 404, 406

Devnani-Chulani, S., 688, 691, 694, 699

Chaudron, M. R. V., 175, 195

Dijkstra, E. W., 227, 254

Checkland, P., 559, 579

Dipti, 282

Chen, L., 169, 194

Dollimore, J., 491, 517, 519

Cheng, B. H. C., 135

Douglass, B. P., 299, 305, 617, 620, 637

Chidamber, S., 721, 729

Duftler, M., 544, 550

Chrissis, M. B., 67, 71, 734, 755

Dunteman, G., 655, 666

Christerson, M., 125, 137, 144, 165

Duquenoy, P., 31, 42

Chulani, S., 684, 688, 699

Dybä, T., 69, 84, 99

Author Index 805

Graydon, P. J., 362, 371

E

Gregory, G., 82, 100, 233, 243, 254

Griss, M., 443, 463, 478, 489

Ebert, C., 611, 635, 637

Gryczan, G., 446, 462

Edwards, J., 507, 519

El-Amam, K., 721, 729

Ellison, R. J., 425, 432, 434

Erickson, J., 140, 165

Erl, T., 526, 534, 548, 550

H

Erlikh, L., 256, 282

Hall, A., 300, 305, 404, 406

Hall, E., 644, 666

Hall, M. A., 726, 729

Hamilton, S., 358, 371

F

Han, S., 719, 726, 729

Harel, D., 156, 165, 617, 637

Fagan, M. E., 230, 254, 713, 729

Harford, T., 726, 729

Fairley, R. E., 563, 579

Harkey, D., 507, 519

Faivre, A., 356, 371

Harrison, N. B., 175, 194

Fayad, M. E., 446, 462

Hatton, L., 325, 338

Fayoumi, A., 602, 609

Heimdahl, M. P. E., 300, 305

Feathers, M., 280

Heineman, G. T., 467, 489

Fielding, R., 530, 550

Helm, R., 175, 194, 209, 210, 222, 225,

Firesmith, D. G., 383, 406

444, 463

Fitzgerald, J., 300, 302, 303, 305, 735, 755

Henney, K., 175, 194, 209, 224

Fitzpatrick, B., 216, 225

Heslin, R., 663, 666

Fogel, K., 222

Hitchins, D., 581, 608

Fowler, M., 80, 100, 279, 282

Hnich, B., 487

Fox, A., 517

Hofmeister, C., 174, 195

Frank, E., 726, 729

Holdener, A. T., 28, 42, 445, 463, 512, 519

Freeman, A., 28, 42

Hollnagel, E., 409, 417–18, 434

Holtzman, J., 552, 579

Holzmann, G. J., 336, 358, 371

Hopkins, R., 94, 100, 256, 282

Horowitz, E., 683, 684, 688, 699

G

Howard, M., 405

Hudepohl, J. P., 360, 372

Hull, R., 151, 165

Gabriel, R. P., 581, 583, 607, 609

Humphrey, 67

Gagne, G., 616, 637

Humphrey, W., 702, 713, 729

Galin, D., 727

Hutchinson, J., 162, 165

Gallis, H., 84, 99

Galvin, P. B., 616, 637

Gamma, E., 175, 194, 209, 210, 222, 225, 444, 463

Garfinkel, S., 398, 406

Garlan, D., 172, 175, 191, 192, 195, 459,

I

460, 461, 463

Gokhale, A., 443, 445, 463

Gotterbarn, D., 29, 40, 42

Ince, D., 709, 729

806 Author Index

J

Konrad, M., 67, 71, 734, 755

Kopetz, H., 635

Korfiatis, P., 563, 579

Jackson, K., 552, 579

Koskela, L., 59, 71

Jacobson, I., 24, 41, 42, 125, 137, 140, 144, 165,

Koskinen, J., 275, 282

166, 443, 463, 478, 489

Kotonya, G., 473, 489

Jain, P., 175, 195, 209, 225

Kozlov, D., 275, 282

Jeffrey, R., 727

Krogstie, J., 272, 282

Jeffries, R., 81, 84, 100, 140, 165, 242, 254

Krutchen, P., 46, 71, 173, 175, 195

Jenkins, K., 94, 100, 256, 282

Kuehl, S., 552, 579

Jenney, P., 404, 407

Kumar, Y., 280

Jhala, R., 300, 305, 358, 371

Kwiatkowska, M. Z., 300, 305, 358, 371, 583,

Joannou, D., 602, 609

592, 609

Johnson, D. G., 31, 42

Johnson, R., 175, 194, 209, 210, 222, 225, 444, 463

Jones, C., 280, 611, 635, 637

Jones, T. C., 256, 282

Jonsson, P., 125, 137, 144, 165, 443, 463,

L

478, 489

Jonsson, T., 487

Lamport, L., 495

Landwehr, C., 286, 303, 304

Lane, A., 392, 407

Lange, C. F. J., 175, 195

K

Laprie, J. C., 286, 303, 304, 409, 434

Larman, C., 73, 100, 222

Larsen, P.G., 300, 302, 303, 305

Kaner, C., 246, 254

Lau, K-K., 466, 470, 487, 489

Kawalsky, R., 602, 609

Laudon, K., 31, 42

Kazman, R., 169, 170, 175, 192, 194

LeBlanc, D., 405

Keen, J., 583, 592, 609

Ledinot, E., 357, 371

Kelly, T., 583, 592, 609

Lee, E. A., 612, 637

Kemerer, C. F., 275, 282, 721, 729

Leffingwell, D., 95, 100

Kennedy, D. M., 563, 579

Lehman, M., 271

Kerievsky, J., 279, 282

Leme, F., 82, 100, 233, 243, 254

Kessler, R. R., 84, 100

Leveson, N. G., 324, 338, 368

Khalaf, R., 544, 550

Leveson, N. G., 349, 371

Kifer, M., 186, 195

Levin, V., 300, 305, 358, 361, 370

Kilner, S., 90, 100

Lewis, B., 521, 550

Kindberg, T., 491, 517, 519

Lewis, P. M., 186, 195

King, R., 151, 165

Leymann, F., 532, 550

Kircher, M., 175, 195, 209, 225

Lichtenberg, J., 300, 305

Kitchenham, B., 718, 727, 729

Lidman, S., 41, 42

Kiziltan, Z., 487

Lientz, B. P., 256, 282

Klein, M., 581, 583, 607, 609

Lilienthal, C., 446, 462

Kleppe, A., 485, 489

Linger, R. C., 230, 254, 332, 338, 425,

Knight, J. C., 324, 338, 362, 371

432, 434

Knoll, R., 446, 462

Lipson, H., 425, 434

Koegel, M., 161, 165

Lister, T., 665

Author Index 807

Loeliger, J., 216, 225, 735, 755

Mili, H., 476, 489

Lomow, G., 524, 550

Miller, K., 29, 40, 42

Longstaff, T., 425, 432, 434

Miller, S. P., 300, 305

Loope, J., 753, 755

Mitchell, R. M., 263, 282

Lopes, R., 359, 371

Monate, B., 357, 371

Lou, J-G., 719, 726, 729

Monk, E., 455, 463

Lovelock, C., 521, 550

Moore, A., 425, 432, 434

Lowther, B., 275, 282

Morisio, M., 453, 460, 461, 463

Lutz, R. R., 238, 254, 371

Mostashari, A., 563, 579

Lutz, R. R., 340, 371

Moy, Y., 357, 371

Lyu, M. R., 336, 338

Mulder, M., 87, 100

Musa, J. D., 334, 338

Muskens, J., 175, 195

M

N

Madachy, R., 683, 684, 688, 699

Madeira, H., 341, 372

Maier, M. W., 582, 583, 588, 589, 599–600,

Nagappan, N., 360, 372

607, 609

Nascimento, L., 461

Majumdar, R., 300, 305, 358, 371

Natarajan, B., 443, 445, 463

Marciniak, J. J., 69

Naur, P., 19, 42

Markkula, J., 275, 282

Newcomer, E., 524, 550

Marshall, J. E., 663, 666

Ng, P-W., 41, 42

Martin, D., 117, 137, 175, 195

Nii, H. P., 180, 195

Martin, R. C., 244, 254

Nord, R., 174, 195

Maslow, A. A., 383, 666

Norman, G., 358, 371

Massol, V., 82, 100, 233, 243, 254

Northrop, L., 581, 583, 607, 609

McCay, B., 552, 579

Nuseibeh, B., 169, 194

McComb, S. A., 563, 579

McConnell, S., 713, 729

McCullough, M., 216, 225, 735, 755

McDermid, J., 583, 592, 609

O

McDougall, P., 510, 519

McGarvey, C., 300, 305

McGraw, G., 333, 338, 396, 407

O’Hanlon, C., 274, 282

McMahon, P. E., 41, 22

Ockerbloom, J., 459, 460, 463

Mead, N. R., 425, 434

Oliver, D., 552, 579

Mejia, F., 356, 371

Oman, P., 275, 282

Mellor, S. J., 159, 162, 165

Ondrusek, B., 300, 305

Melnik, G., 81, 100, 242, 254

Opdahl, A. L., 386, 407

Menzies, T., 719, 725, 726, 727, 729

Opdyke, W., 80, 100, 279, 282

Meunier, R., 175, 194, 209, 225

Oram, A., 510, 517, 519

Meyer, B., 485, 489

Orfali, R., 507, 519

Meynadier, J-M., 356, 371

Ould, M., 644, 666

Miers, D., 544, 550

Overgaard, G., 125, 137, 144, 165

Mili, A., 476, 489

Owens, D., 552, 579

808 Author Index

P

Rogerson, S., 29, 40, 42

Rohnert, H., 175, 194, 195, 209, 225

Rosenberg, F., 544, 550

Paige, R., 583, 592, 609

Rouncefield, M., 162, 165

Paries, J., 432, 434

Royce, W. W., 47, 71, 98, 687, 699

Parker, D., 358, 371

Rubin, K. S., 78, 85, 98, 100, 680, 699

Parnas, D., 296, 302, 305

Ruby, S., 531, 550

Patel, S., 162, 166

Rumbaugh, J., 140, 165, 166

Patterson, D., 517

Ryan, P., 754

Pautasso, C., 532, 550

Perrow, C., 342, 343, 371

Pfleeger, C. P., 376, 377, 407

S

Pfleeger, S. L., 376, 377, 407

Pilato, C., 216, 225, 735, 755

Pooley, R., 126, 137, 163

Sachs, S., 731, 754, 755

Poore, J. H., 230, 254, 332, 338

Sakkinen, M., 275, 282

Pope, A., 466, 489, 493, 519

Sametinger, J., 470, 472, 489

Prowell, S. J., 230, 254, 332, 338

Sami, M., 69

Pullum, L., 318, 336, 338

Sanderson, D., 516, 519

Sarris, S., 445, 463, 512, 519

Sawyer, P., 125, 137

Q

Scacchi, W., 69

Schatz, B., 87, 100

Schmidt, D. C., 175, 194, 195, 209, 224, 225, 443,

Quinn, M. J., 40

445, 446, 462, 463, 581, 583, 607, 609

Schneider, S., 357, 371

Schneier, B., 384, 396, 407

R

Schoenfield, B., 392, 407

Schuh, P., 754

Schwaber, K., 71, 85, 100

Rajamani, S. K., 300, 305, 358, 361, 370

Scott, J. E., 456, 463

Rajlich, V. T., 257, 282

Scott, K., 159, 165

Randell, B., 19, 42, 286, 303, 304,

Selby, R. W., 231, 254, 683, 699

307, 338

Shaw, M., 172, 175, 191, 192, 195

Ray, A., 368

Shimeall, T. J., 349, 371

Rayhan, S., 727

Shou, P. K., 296, 305

Raymond, E. S., 219, 225

Shrum, S., 67, 71, 734, 755

Reason, J., 418, 420–21, 434

Siau, K., 140, 165

Regan, P., 358, 371

Silberschaltz, A., 616, 637

Reifer, D., 684, 688, 699

Sillitto, H., 578, 596, 600, 609

Richardson, L., 531, 550

Silva, N., 341, 359, 371, 372

Riehle, D., 446, 462

Sindre, G., 386, 407

Rittel, H., 130, 137, 562, 579, 592, 609

Sjøberg, D. I. K., 69, 84, 99

Ritter, G., 637

Smart, J. F., 743, 755

Roberts, D., 80, 100, 279, 282

Snipes, W., 360, 372

Robertson, J., 123, 135, 137

Sommerlad, P., 175, 194, 209, 225

Robertson, S., 123, 135, 137

Sommerville, I., 117, 135, 137, 175, 195, 461, 559,

Rodden, T., 125, 137

579, 583, 592, 607, 609

Rodriguez, A., 548

Soni, D., 174, 195

Author Index 809

Souyris, J., 356, 372

V

Spafford, E., 399, 401, 407

Spence, I., 41, 42

St. Laurent, A., 220, 225

Valeridi, R., 697

Stafford, J., 488

van Schouwen, J., 296, 305

Stahl, T., 159, 166

Van Steen, M., 491, 519

Stal, M., 205, 209, 225

van Vliet, M., 87, 100

Stallings, W., 376, 405, 407,

Vandermerwe, S., 521, 550

616, 637

Veras, P. C., 341, 372

Stapleton, J., 71, 100

Vicente, D., 359, 371

Steece, B., 684, 688, 699

Viega, J., 396, 405, 407

Stevens, P., 126, 137, 163

Vieira, M., 341, 372

Stevens, R., 552, 579, 583, 609

Villani, E., 341, 372

Stewart, J., 574, 579

Viller, S., 117, 137

Stoemmer, P., 697

Virelizier, P., 300, 305, 358, 371

Storey, N., 349, 372

Vlissides, J., 175, 194, 209, 210, 222, 225,

Strunk, E. A., 362, 371

444, 463

Suchman, L., 117, 137

Voas, J., 333, 338

Swanson, E. B., 256, 282

Voelter, M., 159, 166

Swartz, A. J., 294, 305

Vogel, L., 32, 42, 218, 225

Szyperski, C., 467, 474, 487,

Vouk, M. A., 360, 372

488, 489

W

T

Waeselynck, H., 300, 305, 358, 371

Tahchiev, P., 82, 100, 233, 243, 254

Wagner, B., 455, 463

Tanenbaum, A. S., 491, 519

Wagner, L. G., 300, 305

Tavani, H. T., 31, 42

Wallach, D. S., 512, 519

Thayer, R. H., 552, 579

Wang, Z., 466, 470, 487, 489

Thayer, R. H., 563, 579

Warmer, J., 485, 489

Tian, Y., 602, 609

Warren, I., 266, 282

Torchiano, M., 453, 460, 461, 463

Webber, M., 130, 137, 562, 579,

Torres-Pomales, W., 318, 338

592, 609

Trammell, C. J., 230, 254, 332, 338

Weils, V., 356, 357, 358, 372

Trimble, J., 299, 305

Weinberg, G., 83, 100

Tully, C., 552, 579

Weiner, L., 203, 225

Turner, M., 517

Weinreich, R., 470, 472, 489

Turner, R., 45, 71

Weise, D., 159, 165

Twidale, M., 125, 137

Wellings, A., 619, 631, 634, 635, 637

Westland, C., 683, 699

Whalen, M. W., 300, 305

U

Wheeler, D. A., 396, 407

Wheeler, W., 52, 71, 473, 489

White, J., 52, 71, 473, 489

Ulrich, W. M., 276, 282

White, S. A., 544, 550

Ulsund, T., 69

White, S., 552, 579

Ustuner, A., 300, 305

Whittaker, J. A., 237, 242, 252, 254

810 Author Index

Whittle, J., 162, 165

Y

Wiels, V., 300, 305, 371

Wilkerson, B., 203, 225

Willey, A., 552, 579

Yacoub, S., 476, 489

Williams, L., 84, 100, 360, 372

Yamaura, T., 252

Williams, R., 574, 579

Wimmer, M., 139, 159, 163, 165

Wirfs-Brock, R., 203, 225

Witten, I. H., 726, 729

Z

Woodcock, J., 300, 302, 303, 305

Woods, D., 432, 434

Wreathall, J., 432, 434

Zelkowitz, M., 637

Wysocki, R. K., 665

Zhang, D., 719, 726, 729

Zhang, H., 719, 726, 729

Zhang, Y., 162, 166

Zheng, J., 360, 372

X

Zimmermann, O., 532, 550

Zimmermann, T., 719, 725, 726, 727, 729

Zullighoven, H., 446, 462

Xie, T., 719, 726, 729

Zweig, D., 275, 282

Document Outline

Cover

Title Page

Copyright Page

Preface

Acknowledgements

Contents at a glance

Dedication

Contents

Part 1 Introduction to Software Engineering

Chapter 1 Introduction

1.1 Professional software development

1.2 Software engineering ethics

1.3 Case studies

Chapter 2 Software processes

2.1 Software process models

2.2 Process activities

2.3 Coping with change

2.4 Process improvement

Chapter 3 Agile software development

3.1 Agile methods

3.2 Agile development techniques

3.3 Agile project management

3.4 Scaling agile methods

Chapter 4 Requirements engineering

4.1 Functional and non-functional requirements

4.2 Requirements engineering processes

4.3 Requirements elicitation

4.4 Requirements specification

4.5 Requirements validation

4.6 Requirements change

Chapter 5 System modeling

5.1 Context models

5.2 Interaction models

5.3 Structural models

5.4 Behavioral models

5.5 Model-driven architecture

Chapter 6 Architectural design

6.1 Architectural design decisions

6.2 Architectural views

6.3 Architectural patterns

6.4 Application architectures

Chapter 7 Design and implementation

7.1 Object-oriented design using the UML

7.2 Design patterns

7.3 Implementation issues

7.4 Open-source development

Chapter 8 Software testing

8.1 Development testing

8.2 Test-driven development

8.3 Release testing

8.4 User testing

Chapter 9 Software evolution

9.1 Evolution processes

9.2 Legacy systems

9.3 Software maintenance

Part 2 System Dependability and Security

Chapter 10 Dependable systems

10.1 Dependability properties

10.2 Sociotechnical systems

10.3 Redundancy and diversity

10.4 Dependable processes

10.5 Formal methods and dependability

Chapter 11 Reliability engineering

11.1 Availability and reliability

11.2 Reliability requirements

11.3 Fault-tolerant architectures

11.4 Programming for reliability

11.5 Reliability measurement

Chapter 12 Safety engineering

12.1 Safety-critical systems

12.2 Safety requirements

12.3 Safety engineering processes

12.4 Safety cases

Chapter 13 Security engineering

13.1 Security and dependability

13.2 Security and organizations

13.3 Security requirements

13.4 Secure systems design

13.5 Security testing and assurance

Chapter 14 Resilience engineering

14.1 Cybersecurity

14.2 Sociotechnical resilience

14.3 Resilient systems design

Part 3 Advanced Software Engineering

Chapter 15 Software reuse

15.1 The reuse landscape

15.2 Application frameworks

15.3 Software product lines

15.4 Application system reuse

Chapter 16 Component-based software engineering

16.1 Components and component models

16.2 CBSE processes

16.3 Component composition

Chapter 17 Distributed software engineering

17.1 Distributed systems

17.2 Client–server computing

17.3 Architectural patterns for distributed systems

17.4 Software as a service

Chapter 18 Service-oriented software engineering

18.1 Service-oriented architecture

18.2 RESTful services

18.3 Service engineering

18.4 Service composition

Chapter 19 Systems engineering

19.1 Sociotechnical systems

19.2 Conceptual design

19.3 System procurement

19.4 System development

19.5 System operation and evolution

Chapter 20 Systems of systems

20.1 System complexity

20.2 Systems of systems classification

20.3 Reductionism and complex systems

20.4 Systems of systems engineering

20.5 Systems of systems architecture

Chapter 21 Real-time software engineering

21.1 Embedded system design

21.2 Architectural patterns for real-time software

21.3 Timing analysis

21.4 Real-time operating systems

Part 4 Software Management

Chapter 22 Project management

22.1 Risk management

22.2 Managing people

22.3 Teamwork

Chapter 23 Project planning

23.1 Software pricing

23.2 Plan-driven development

23.3 Project scheduling

23.4 Agile planning

23.5 Estimation techniques

23.6 COCOMO cost modeling

Chapter 24 Quality management

24.1 Software quality

24.2 Software standards

24.3 Reviews and inspections

24.4 Quality management and agile development

24.5 Software measurement

Chapter 25 Configuration management

25.1 Version management

25.2 System building

25.3 Change management

25.4 Release management

Glossary

A

B

C

D

E

F

G

H

I

J

L

M

N

O

P

Q

R

S

T

U

V

W

X

Z

Subject index

A

B

C

D

E

F

G

H

I

J

L

M

N

O

P

Q

R

S

T

U

V

W

X

Author index

A

B

C

D

E

F

G

H

I

J

K

L

M

N

O

P

Q

R

S

T

U

V

W

X

Y

Z